SlideShare a Scribd company logo
Server Clusters : Cluster Configuration Best Practices
For Windows Server 2003
Microsoft Corporation
Published: January 2003
Abstract
This document describes several recommended cluster configurations, or topologies, among the many that are
possible for Microsoft® Windows® Server 2003. Each configuration is accompanied by a discussion of the relative
advantages and disadvantages for various models of application deployment.
Microsoft® Windows® Server 2003 Technical Article
The information contained in this document represents the current view of
Microsoft Corporation on the issues discussed as of the date of publication.
Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and
Microsoft cannot guarantee the accuracy of any information presented after
the date of publication.
This document is for informational purposes only. MICROSOFT MAKES
NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION
IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user.
Without limiting the rights under copyright, no part of this document may be
reproduced, stored in or introduced into a retrieval system, or transmitted in
any form or by any means (electronic, mechanical, photocopying, recording,
or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or
other intellectual property rights covering subject matter in this document.
Except as expressly provided in any written license agreement from
Microsoft, the furnishing of this document does not give you any license to
these patents, trademarks, copyrights, or other intellectual property.
© 2002. Microsoft Corporation. All rights reserved. Microsoft, SQL Server,
Windows, and Windows NT are either registered trademarks or trademarks
of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the
trademarks of their respective owners.
Server clusters: Cluster Configuration Best Practices ii
Microsoft® Windows® Server 2003 Technical Article
Contents
Contents
................................................................................................................................................................ iii
Cluster and Application Deployments..................................................................................................4
Server Load............................................................................................................................................. 6
Server Load – The Basics.....................................................................................................................6
Application Style..................................................................................................................................10
Application Deployments.....................................................................................................................16
Failover Policies..................................................................................................................................17
Failover Pairs...................................................................................................................................17
Hot-Standby Server.........................................................................................................................18
N+I...................................................................................................................................................19
Failover Ring ..................................................................................................................................20
Random........................................................................................................................................... 21
Customized control..........................................................................................................................22
Load Balancing.....................................................................................................................................24
Server clusters: Cluster Configuration Best Practices iii
Microsoft® Windows® Server 2003 Technical Article
Cluster and Application Deployments
Historically, we have described clusters and applications as being active/active or active/passive, which is
a natural result of the fact that the earliest products supported only 2-node clusters. In some cases,
however, this has led to confusion because the terms can be interpreted in different ways depending on
whether the context is 1) how a given application instance is running, 2) how different instances of an
application are running or 3) whether the nodes in the cluster are performing useful work. To fully
understand the specific application and cluster deployments, we need to understand the following:
• How many instances of the application are actually up and running?
• How many instances of the same data are in the cluster?
• Do instances of the data move around?
• Do different instances of the same application run on different cluster nodes?
• Can different applications run on different cluster nodes?
• What kind of load does the application put on a server, and how will this load be re-distributed
after a failover?
Before describing how an application is deployed, we must first define what an application is. An
application, for the purposes of this document, is defined as the running code and data required to
provide a single service to an end-user or client. Consider a couple of different examples to illustrate the
point: A single instance of Microsoft®
Word running on a workstation is a single application instance. If
there are multiple instances of Word running, each instance is considered a different application instance.
There are, however, more complex applications. For example, with Microsoft SQL Server™, a single
database is considered to be a single application instance. Independent databases are considered to be
different application instances. A single database, however, can be partitioned into multiple SQL Server
instances and “tied-together” using the SQL Server query engine. In this case, the set of SQL Server
instances that are tied together to provide a single database image is considered a single application
instance.
Fundamentally, there are five different attributes that together provide a complete view of the deployment
and can be used to reason about or characterize the deployment:
• Server load – How much of a server’s resources are consumed by the applications it
supports, and how will this resource utilization be affected by failover?
• Application style – Is the application a single monolithic application that runs on a single node
or is the complete application split into smaller pieces that can be deployed around the
cluster?
Server Clusters: Cluster Configuration Best Practices 4
Microsoft® Windows® Server 2003 Technical Article
• Application deployment – How are the pieces of the application spread around the cluster in a
given deployment?
• Failover policies – How is the application configured to behave after a failover?
• Application implementation – How is the application itself implemented?
Server Clusters: Cluster Configuration Best Practices 5
Microsoft® Windows® Server 2003 Technical Article
Server Load
When deploying an application, it is always important to consider what demands it will make on a server’s
resources. With clustering, there is a related issue that also needs to be taken into account: how is the
load re-distributed after a failover?
Server Load – The Basics
Consider one of the simplest cases, an active/active, 2-node file server cluster, with node A and node B
each serving a single share. If node A fails, its resources will move to node B, placing an additional load
on node B. In fact, if node A and B were each running at only 50% capacity before the failure, node B will
be completely saturated (100% of capacity) after the failover is completed, and performance may suffer.
While this situation may not be optimal, it is important to remember that having all of the applications still
running, even in a reduced performance scenario, is a 100% improvement over what you would have
without the high availability protection that clusters provide. But this does bring up the notion of risk, and
what amount of it you are willing to accept in order to protect the performance, and ultimately the
availability, of your applications.
We have intentionally chosen the worst case (an active/active, 2-node cluster with each node running a
single application that consumes half of the server’s resources) for the purpose of clarity. With an
additional node, the equation changes: there are more servers to support the workload, but if all three
nodes are running at 50% capacity and there are two failures, the single remaining server will simply not
be able to handle the accumulated load of the applications from both of the failed servers. Of course, the
likelihood of two failures is considerably less than that of a single failure, so the risk is mitigated
somewhat.
Nevertheless, the load/risk tradeoff must be considered when deploying applications on a cluster. The
more nodes in a cluster, the more options you have for distributing the workload. If your requirements
dictate that all of your clustered applications must run with no performance degradation, then you may
need to consider some form of active/passive configuration. But even in this scenario, you must consider
the risks of the various configurations. If you cannot accept even the slightest risk of any reduced
performance under any conditions whatsoever, you will need a dedicated passive node for each active
node.
If, on the other hand, you are convinced that the risk of multiple failures is small, you have other choices.
If you have a 4-node, or 8-node cluster, you may want to consider an “N+I” configuration. N+I, which is
discussed in more detail in section 1.4.3, is a variant of active/passive, where N nodes are active, and I
nodes are passive, or reserve nodes. Typically, the value for I is less than the value for N, and an N+I
cluster topology can handle I failures before any performance degradation is likely. The risk is that with
Server Clusters: Cluster Configuration Best Practices 6
Microsoft® Windows® Server 2003 Technical Article
more than I failures, performance will likely decline, but once again, the likelihood of multiple failures is
increasingly remote.
For this reason, N+I clusters are a useful configuration that balances the hardware cost of having 100%
passive server capacity against the low level risk of multiple cluster node failures.
Server Load – Some More Realistic Configurations
The scenarios above were intentionally simplistic, assuming that one application imposed a monolithic
load on each server, and thus its resource utilization could not be spread among more than one other
server in the event of a failover. That is often not the case in the real world, especially for file and print
servers, so we will take a look at some additional scenarios with a 4-node cluster, named ABCD, and
having nodes A, B, C, and D.
Typically a single server will support the load of more than one application. If under normal conditions,
each server was loaded at 25%, then the ABCD cluster could survive the loss of three members before a
likely loss of application availability, which would be nearly a worst-case scenario.
The following series of figures illustrates what would happen with the application load in a 4-node cluster
for successive node failures. The shaded, or patterned, areas indicate the capacity demands of the
running applications. Further, the example below assumes that the application load on any given server
is divisible, and can be redistributed among any of the surviving nodes.
Figure 1.1: Cluster under normal operating conditions (each node loaded at 25%)
Server Clusters: Cluster Configuration Best Practices 7
Microsoft® Windows® Server 2003 Technical Article
Figure 1.2: Cluster after a single node failure. Note redistribution of application load.
Figure 1.3: Cluster after two node failures. Each surviving node is now approximately 50%
loaded.
Figure 1.4: After three node failures, single surviving node is at full capacity.
If each node were running at 75% capacity, then without sensible failover policies, even a single node
failure could result in loss of application availability. Depending on the application(s), however, you can
specify that, in the event of a server failure, some percentage of the applications should fail over to node
A, node B, node C, and node D. If the applications are spread evenly among the surviving nodes, then
this cluster could now survive the loss of a single machine, because one third of the failed server’s load
(one third of 75% is 25%) is allocated to each of the surviving three machines. The result is three fully
loaded servers (each of the nodes running at 75% capacity now have an additional 25%), but all
applications are still available.
As a variation on the example that was illustrated previously, the following series of figures will depict
what happens to a 4-node cluster, each of which is running at approximately 33% capacity. Further, in
Server Clusters: Cluster Configuration Best Practices 8
Microsoft® Windows® Server 2003 Technical Article
this case, the application load is indivisible, and can not be spread among multiple other servers in the
event of a failover (perhaps it is a single application or multiple applications that depend on the same
resource).
Figure 2.1: Cluster under normal operating conditions (each node loaded at approximately 33%)
Figure 2.2: Cluster after a single node failure. Note redistributed application load.
Figure 2.3: Cluster after second node failure. Each surviving node is now running at
approximately 66% capacity. Note that in the event of another node failure, this cluster will no
longer be capable of supporting all four of these applications.
Figure 2.4: After third failure, the single surviving server can only support three of the four
applications.
Server Clusters: Cluster Configuration Best Practices 9
Microsoft® Windows® Server 2003 Technical Article
Another style of failover policy is “Failover Pairs”, also known as “Buddy Pairs”. Assuming each of the
four servers is loaded at 50% or less, a failover “buddy” can be associated with each machine. This
allows a cluster in this configuration to survive two failures. More details on Failover Pairs can be found in
section 1.4.1.
Taking the previous Failover Pair example, we can convert it to an active/passive configuration by loading
two servers at 100% capacity, and having two passive backup servers. This active/passive configuration
can also survive two failures, but note that under ordinary circumstances, two servers remain unused.
Furthermore, performance of these servers at 100% load is not likely to be as good as with the Failover
Pair configuration, where each machine is only running at 50% utilization. Between these two examples,
note that you have the same number of servers, the same number of applications, and the same
survivability (in terms of how many nodes can fail without jeopardizing application availability). However,
the Failover Pair configuration clearly comes out ahead of active/passive in terms of both performance
and economy.
Still using our ABCD 4-node cluster, consider what happens if we configure it as an N+I cluster (explained
in more detail in section 1.4.3) where three nodes are running at 100% capacity, and there is a single
standby node. This cluster can only survive a single failure. As before, however, comparing it to the
example where each server is running at 75% capacity, you again have the same number of servers,
applications, and same survivability, but the performance and economy can suffer when you have passive
servers backing up active servers running at 100% load.
Application Style
Now it is time to consider some aspects of the application’s design, and how that affects the way it is
deployed. Applications running in a cluster can be characterized as one of:
• Single Instance
In a single instance application, only one application instance is running in the cluster at any
point in time. An example of this is the DHCP service in a server cluster. At any point in time,
there is only one instance of the DHCP service running in a cluster. The service is made
highly available using the failover support provided with server clusters. Single instance
applications typically have state that cannot be partitioned across the nodes. In the case of
DHCP, the set of IP addresses that have been leased is relatively small, but potentially highly
volatile in a large environment. To avoid the complexities of synchronizing the state around
the cluster, the service runs as a single instance application.
Server Clusters: Cluster Configuration Best Practices 10
App
Only one instance of the
application is running in the
cluster at any one time
Cluster
Microsoft® Windows® Server 2003 Technical Article
Figure 3: Single instance application
In the example above, the cluster has four nodes. The single instance application, by definition,
can be running on only one node at a time.
• Multiple Instance
A multiple instance application is one where multiple instances of the same code or different
pieces of code that cooperate to provide a single service can be executing around the cluster.
Together these instances provide the illusion of a single service to an end-user or client
computer. There are two different types of multiple instance applications that depend on the
type of application and data:
Server Clusters: Cluster Configuration Best Practices 11
Microsoft® Windows® Server 2003 Technical Article
 Cloned Application. A cloned application is a single logical application that consists of
two or more instances of the same code running against the same data set. Each
instance of the application is self-contained, thus enabling a client to make a request to
any instance of the application with the guarantee that regardless of the instance, the
client will receive the same result. A cloned application is scalable since additional
application instances can be deployed to service client requests and the instances can be
deployed across different nodes, thus enabling an application to grow beyond the
capacity of a single node. By deploying application instances on different nodes, the
application is made highly available. In the event that a node hosting application
instances fails, there are instances running on other nodes that are still available to
service client requests.
Typically, client requests are load-balanced across the nodes in the cluster to spread the
client requests amongst the application instances. Applications that run in this environment
do not have long-running in-memory state that spans client requests, since each client
request is treated as an independent operation and each request can be load-balanced
independently. In addition, the data set must be kept consistent across the entire application.
As a result, cloning is a technique typically used for applications that have read-only data or
data that is changed infrequently. Web server front-ends, scalable edge-servers such as
arrays of firewalls and middle-tier business logic fall into this category.
While these applications are typically called stateless applications1
they can save client-
specific session-oriented state in a persistent store that is available to all instances of the
cloned application, however, the client must be given a token or a key that it can present with
subsequent requests so that whichever application instance services each request can
associate the request with the appropriate client state.
The Microsoft Windows®
platform provides Network Load Balancing as the basic
infrastructure for building scale-out clusters with the ability to spray client requests across the
nodes. Application Center provides a single image view of management for these
environments.
1
This is really a misnomer, the applications have state, however, the state does not span individual client requests.
Server Clusters: Cluster Configuration Best Practices 12
A-Z
App
Cluster
App App App
A-Z A-Z A-Z
Microsoft® Windows® Server 2003 Technical Article
Figure 4: Cloned Application
In the example above, the same application “App” is running on each node. Each instance of
the application is accessing the same data set, in this case the dataset A-Z. This example
show each instance having access to its own data set (created and kept consistent using
some form of staging or replication technique), some applications can share the same data
set (where the data is available to the cluster using a file share for example).
Figure 5: Cloned Application using a file share
 Partitioned Applications. Applications that have long-running in-memory state or have
large, frequently updated data sets cannot be easily cloned. These applications are
typically called stateful applications. The cost of keeping the data consistent across many
instances of the application would be prohibitive.
Server Clusters: Cluster Configuration Best Practices 13
A-Z
App
Cluster
App App App
File Server
sharing out the
data set A-Z
Microsoft® Windows® Server 2003 Technical Article
Fortunately, however, many of these applications have data sets or functionality that can be
readily partitioned. For example, a large file server can be partitioned by dividing the files
along the directory structure hierarchy or a large customer database can be partitioned along
customer number or customer name boundaries (customers from A to L in one database,
customers from M to Z in another database, for example). In other cases, the functionality
can be partitioned or componentized. Once an application is partitioned, the different
partitions can be deployed across a set of nodes in a cluster, thus enabling the complete
application to scale beyond the bounds of a single node. In order to present a single
application image to the clients, a partitioned application requires an application-dependent
decomposition, routing and aggregation mechanism that allows a single client request to be
distributed across the set of partitions and the results from the partitions to be combined into
a single response back to the client. For example, take a partitioned customer database, a
single SQL query to return all of the accounts that have overdue payments requires that the
query be sent to every partition of the database. Each partition will contain a subset of
customer records that must be combined into a single data set to be retuned to the client.
Partitioning an application allows it to scale but does not provide high availability. If a node
fails that is hosting a partition of the application, that piece of the data set or that piece of
functionality is no longer accessible.
Partitioning is typically application-specific, since the aggregation mechanism is dependent of
the type of application and the type of data returned. SQL Server, Exchange data stores and
DFS are all examples of applications and services that can be partitioned for scalability.
Figure 6: Partitioned Application – Data Partitioning
In the example above, each node is running multiple instances of the same application
against different pieces of the complete data set. A single request from a client application
can span multiple instances, for example a query to return all the records in the database.
This splitting of the client across the different application instances and the aggregation of the
Server Clusters: Cluster Configuration Best Practices 14
A-F
App
Cluster
App App App
G-M N-R S-Z
Microsoft® Windows® Server 2003 Technical Article
single result passed back to the client can be done either by the applications on the server
cooperating amongst themselves (for example the SQLQuery engine) or it can be done by
the client (as is the case with Exchange 2000 data stores).
Applications may also be partitioned along functional lines as well as data sets.
Figure 7: Partitioned Application - Functional Partitions
In the above example, each node in the cluster is performing a different function; however,
the cluster together provides a single, uniform service. One node is providing a catalog
service; one is providing a billing service, etc. The cluster, though, provides a single “book
buying service” to clients.
Computational clusters are built to support explicitly massively parallel applications. In other
words, the applications are specifically written to decompose a problem into many (potentially
thousands) of sub-operations and execute them in parallel across a set of machines. This
type of application is also a partitioned application. Computational clusters typically provide a
set of libraries to provide cluster-wide communication and synchronization primitives tailored
Server Clusters: Cluster Configuration Best Practices 15
Book
Service
Cluster
Catalog Billing
Credit
Check
Client Requests
Microsoft® Windows® Server 2003 Technical Article
to the environment (MPI is one such set of libraries). These clusters have been termed
Beowulf clusters. Microsoft has created the High Performance Computing initiative to provide
support for this type of cluster.
Application Deployments
Applications can be deployed in different ways across a cluster depending on what style of application is
being deployed and the number of applications that are being deployed on the same cluster. They can be
characterized as follows:
1) One, single instance application
In this type of deployment, one node of the cluster is executing the application so that it can
service user requests, the other nodes are in stand-by, waiting to host the application in the
event of failure. This type of deployment is typically suited to 2-node failover clusters and is
not typically used for N-node clusters since only 1/N of the total capacity is being used. This
is what some people term an active/passive cluster.
2) Several, single instance applications
Several single instance applications can be deployed on the same cluster. Each application is
independent from the others. In a failover cluster, each application can failover independently
for the others, so in the case of a node hosting multiple applications, if that node fails, the
different applications may failover to different nodes in the cluster. This type of deployment is
typically used for consolidation scenarios where multiple applications are hosted on a set of
nodes and the cluster provides a highly available environment. In this environment, careful
capacity planning is required to ensue that in the event of failures, the nodes have sufficient
capacity (CPU, memory IO bandwidth etc.) to support the increased load.
3) A single, multiple instance application
In this environment, while the cluster is only supporting one application, various pieces of the
application are running on the different nodes. In the case of a cloned application, each node
is running the same code against the same data. This is the typical web-front end scenario
(all of the nodes in the cluster are identical). In the event of a failure, the capacity is reduced.
In a partitioned application, the individual partitions are typically deployed across the nodes in
the cluster. If a failure occurs, multiple partitions may be hosted on a single node. Careful
planning is required to ensure that, in the event of a failure, the application SLA is achieved.
Take a 4-node cluster as an example. If an application is partitioned into four pieces, then in
normal running, each node would host a partition. In the event that one node failed, two
nodes would continue to host one partition and the remaining node would end up hosting two
partitions, giving the 3rd
node twice the load, potentially. If the application were split into 12
partitions, each node would host three partitions in the normal case. In the event of a failure,
each partition could be configured to failover to a different node, thereby; the remaining
nodes could each host four partitions, spreading the load in the event of a failure. The cost
however, is that support for 12 partitions may be more overhead than for four partitions.
Server Clusters: Cluster Configuration Best Practices 16
Microsoft® Windows® Server 2003 Technical Article
4) Several, multiple instance applications
Of course, several multiple instance applications may be deployed on the same cluster
(indeed single instance and multiple instances may be deployed on the same cluster). In the
case of multiple cloned applications, each node is simply running one instance of each
application. In the case of a partitioned application, capacity planning as well as defining
failover targets becomes increasingly complex.
Failover Policies
Failover is the mechanism that single instance applications and the individual partitions of a partitioned
application typically employ for high availability (the term Pack has been coined to describe a highly
available, single instance application or partition).
In a 2-node cluster, defining failover policies is trivial. If one node fails, the only option is to failover to the
remaining node. As the size of a cluster increases, different failover policies are possible and each one
has different characteristics.
Failover Pairs
In a large cluster, failover policies can be defined such that each application is set to failover between two
nodes. The simple example below shows two applications App1 and App2 in a 4-node cluster.
Figure 8: Failover pairs
This configuration has pros and cons:
Server Clusters: Cluster Configuration Best Practices 17
Node 4
Node 1
App1
Node 2
Node 3
Cluster
App2
App2 can
Failover to Node
4 in the event of
failure of Node 3
App1 can
Failover to Node
2 in the event of
failure of Node 1
Microsoft® Windows® Server 2003 Technical Article
 Good for clusters that are supporting heavy-weight2
applications, such as databases.
This configuration ensures that in the event of failure, two applications will not be hosted
on the same node.
 Very easy to plan capacity. Each node is sized based on the application that it will need
to host (just like a 2-node cluster hosting one application).
 Effect of a node failure on availability and performance of the system is very easy to
determine.
 Get the flexibility of a larger cluster. In the event that a node is taken out for maintenance,
the “buddy” for a given application can be changed dynamically (may end up with
standby policy below).
 In simple configurations, such as the one above, only 50% of the capacity of the cluster is
in use.
 Administrator intervention may be required in the event of multiple failures.
Failover pairs are supported by server clusters on all versions of Windows by limiting the possible owner
list for each resource to a given pair of nodes.
Hot-Standby Server
To reduce the overhead of failover pairs, the “spare” node for each pair may be consolidated into a single
node, providing a hot standby server that is capable of picking up the work in the event of a failure.
Figure 9: Standby Server
The standby server configuration has pros and cons:
2
A heavy-weight application is one that consumes a significant number of system resources such as CPU, memory
or IO bandwidth.
Server Clusters: Cluster Configuration Best Practices 18
Node 1
App1
Node 2
Node 3
Cluster
App2
App2 can
Failover to Node
2 in the event of
failure of Node 3
App1 can
Failover to Node
2 in the event of
failure of Node 1
Microsoft® Windows® Server 2003 Technical Article
 Good for clusters that are supporting heavy-weight applications such as databases. This
configuration ensures that in the event of a single failure, two applications will not be
hosted on the same node.
 Very easy to plan capacity. Each node is sized based on the application that it will need
to host, the spare is sized to be the maximum of the other nodes.
 Effect of a node failure on availability and performance of the system is very easy to
determine.
 Configuration is targeted towards a single point of failure.
 Does not really handle multiple failures well. This may be an issue during scheduled
maintenance where the spare may be in use.
Server clusters support standby servers today using a combination of the possible owners list and the
preferred owners list. The preferred node should be set to the node that the application will run on by
default and the possible owners for a given resource should be set to the preferred node and the spare
node.
N+I
Standby server works well for 4-node clusters in some configurations, however, its ability to handle
multiple failures is limited. N+I configurations are an extension of the standby server concept where there
are N nodes hosting applications and I nodes which are spares.
Figure 10: N+I Spare node configuration
N+I configurations have the following pros and cons:
Server Clusters: Cluster Configuration Best Practices 19
Node 1
App1
Node 2 Node 3
Cluster
App2
Any application
instance can
failover to any
spare node
Node 4 Node 5 Node 6
App3 App4
Microsoft® Windows® Server 2003 Technical Article
 Good for clusters that are supporting heavy-weight applications such as databases or
Exchange. This configuration ensures that in the event of a failure, an application
instance will failover to a spare node, not one that is already in use.
 Very easy to plan capacity. Each node is sized based on the application that it will need
to host.
 Effect of a node failure on availability and performance of the system is very easy to
determine.
 Configuration works well for multiple failures.
 Does not really handle multiple applications running in the same cluster well. This policy
is best suited to applications running on a dedicated cluster.
Server clusters supports N+I scenarios in the Windows Server 2003 release using a cluster group public
property AntiAffinityClassNames. This property can contain an arbitrary string of characters. In the
event of a failover, if a group being failed over has a non-empty string in the AntiAffinityClassNames
property, the failover manager will check all other nodes. If there are any nodes in the possible owners list
for the resource that are NOT hosting a group with the same value in AntiAffinityClassNames, then
those nodes are considered a good target for failover. If all nodes in the cluster are hosting groups that
contain the same value in the AntiAffinityClassNames property, then the preferred node list is used to
select a failover target.
Failover Ring
Failover rings allow each node in the cluster to run an application instance. In the event of a failure, the
application on the failed node is moved to the next node in sequence.
Server Clusters: Cluster Configuration Best Practices 20
Node 4
Node 1
App1
Node 2
Node 3
Cluster
App2
App2 can
Failover to Node
3 in the event of
failure of Node 2
App1 can
Failover to Node
2 in the event of
failure of Node 1
App3App4
App3 can
Failover to Node
4 in the event of
failure of Node 3
App4 can
Failover to Node
1 in the event of
failure of Node 4
Microsoft® Windows® Server 2003 Technical Article
Figure 11: Failover Ring
This configuration has pros and cons:
 Good for clusters that are supporting several small application instances where the
capacity of any node is large enough to support several at the same time.
 Effect on performance of a node failure is easy to predict.
 Easy to plan capacity for a single failure.
 Configuration does not work well for all cases of multiple failures. If Node 1 fails, Node 2
will host two application instances and Nodes 3 and 4 will host one application instance.
If Node 2 then fails, Node 3 will be hosting three application instances and Node 4 will be
hosting one instance.
 Not well suited to heavy-weight applications since multiple instances may end up being
hosted on the same node even if there are lightly-loaded nodes.
Failover rings are supported by server clusters on the Windows Server 2003 release. This is done by
defining the order of failover for a given group using the preferred owner list. A node order should be
chosen and then the preferred node list should be set up with each group starting at a different node.
Random
In large clusters or even 4-node clusters that are running several applications, defining specific failover
targets or policies for each application instance can be extremely cumbersome and error prone. The best
policy in some cases is to allow the target to be chosen at random, with a statistical probability that this
will spread the load around the cluster in the event of a failure.
Server Clusters: Cluster Configuration Best Practices 21
Microsoft® Windows® Server 2003 Technical Article
Random failover policies have pros and cons:
 Good for clusters that are supporting several small application instances where the
capacity of any node is large enough to support several at the same time.
 Does not require an administrator to decide where any given application should failover
to.
 Provided that there are sufficient applications or the applications are partitioned finely
enough, this provides a good mechanism to statistically load-balance the applications
across the cluster in the event of a failure.
 Configuration works well for multiple failures.
 Very well tuned to handling multiple applications or many instances of the same
application running in the same cluster well.
 Can be difficult to plan capacity. There is no real guarantee that the load will be balanced
across the cluster.
 Effect on performance of a node failure is not easy to predict.
 Not well suited to heavy-weight applications since multiple instances may end up being
hosted on the same node even if there are lightly-loaded nodes.
The Windows Server 2003 release of server clusters randomizes the failover target in the event of node
failure. Each resource group that has an empty preferred owners list will be failed over to a random node
in the cluster in the event that the node currently hosting it fails.
Customized control
There are some cases where specific nodes may be preferred for a given application instance.
A configuration that ties applications to nodes has pros and cons:
 Administrator has full control over what happens when a failure occurs.
 Capacity planning is easy, since failure scenarios are predictable.
 With many applications running in a cluster, defining a good policy for failures can be
extremely complex.
 Very hard to plan for multiple, cascaded failures.
Server clusters provide full control over the order of failover using the preferred node list feature. The full
semantics of the preferred node list can be defined as:
Server Clusters: Cluster Configuration Best Practices 22
Microsoft® Windows® Server 2003 Technical Article
Preferred Node List Move group to “best possible” initiated
via administrator
Failover due to node or group failure
Contains all nodes in clusterGroup is moved to highest node in
preferred node list that is up and
running in the cluster.
Group is moved to the next node on
the preferred node list.
Contains a subset of the
nodes in the cluster
Group is moved to highest node in
preferred node list that is up and
running in the cluster.
If no nodes in the preferred node list
are up and running, the group is
moved to a random node.
Group is moved to the next node on
the preferred node list.
If the node that was hosting the group
is the last on the list or was not in the
preferred node list, the group is
moved to a random node.
Empty Group is moved to a random node.Group is moved to a random node.
Server Clusters: Cluster Configuration Best Practices 23
Microsoft® Windows® Server 2003 Technical Article
Load Balancing
Load balancing is a term that means many different things. Within a cluster environment, load balancing
is typically used in two different ways:
• Spreading client requests across a set of clones – In Windows, Network Load Balancing
(NLB) provides the capability to spread client TCP/IP requests across a set of nodes (typically
Web servers or ISA arrays or terminal server machines). This functionality can also be
provided by external network infrastructure such as CISCO local director.
• Optimizing the compute resources across a failover cluster – A cluster that supports failover
allows application instances to “move” around the cluster. Historically this has been used to
ensure that on failure, the application can be restarted elsewhere. However, using this same
mechanism could be used with performance counters and application load metrics to provide
a way to balance the load of applications across the nodes in a cluster. This can be done in
one of two ways:
o Manual load balancing: Each group in a cluster can be moved around
independently of the other cluster groups. By breaking down applications into fine-
granularity groups where it is possible, manual load balancing can be done using the
cluster administration tools. If the load on a given node is high relative to the other
nodes in the cluster, one or more groups can be manually moved to other cluster
nodes.
o Dynamic load balancing: Today, Server clusters do not provide any load balancing
mechanism (beyond a randomized failover policy in Windows Server 2003). Dynamic
load balancing is an extremely complex issue, while moving an application may at
first appear to make sense, the cost of that move can be hard to determine. For
example, moving SQL Server may not only incur the cost of having to re-mount the
database, but it also loses the in-memory cache and state which may have a
significant impact on performance from the client perspective.
Server Clusters: Cluster Configuration Best Practices 24

More Related Content

What's hot

Database mirroring setup
Database mirroring setupDatabase mirroring setup
Database mirroring setup
K Singh
 
Mirroring in SQL Server 2012 R2
Mirroring in SQL Server 2012 R2Mirroring in SQL Server 2012 R2
Mirroring in SQL Server 2012 R2
Mahesh Dahal
 
Geek Sync | Field Medic’s Guide to Database Mirroring
Geek Sync | Field Medic’s Guide to Database MirroringGeek Sync | Field Medic’s Guide to Database Mirroring
Geek Sync | Field Medic’s Guide to Database Mirroring
IDERA Software
 
Ebook10
Ebook10Ebook10
Ebook10
kaashiv1
 
Database Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David IzahkDatabase Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David Izahk
sqlserver.co.il
 
Moss 2007 virtualization_with_v_mware_infrastructure
Moss 2007 virtualization_with_v_mware_infrastructureMoss 2007 virtualization_with_v_mware_infrastructure
Moss 2007 virtualization_with_v_mware_infrastructure
eranjith
 
Sql Server 2008 Enhancements
Sql Server 2008 EnhancementsSql Server 2008 Enhancements
Sql Server 2008 Enhancements
kobico10
 
Kaashiv SQL Server Interview Questions Presentation
Kaashiv SQL Server Interview Questions PresentationKaashiv SQL Server Interview Questions Presentation
Kaashiv SQL Server Interview Questions Presentation
kaashiv1
 
HPE NonStop SQL WebDBS - Introduction
HPE NonStop SQL WebDBS - IntroductionHPE NonStop SQL WebDBS - Introduction
HPE NonStop SQL WebDBS - Introduction
Frans Jongma
 
Sql2008 (1)
Sql2008 (1)Sql2008 (1)
Sql2008 (1)
YanivGupta
 
Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...
Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...
Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...
Ivan Hui
 
Migrations and upgrades
Migrations and upgradesMigrations and upgrades
Migrations and upgrades
Robert Crane
 
Virtual Solution for Microsoft SQL Server
Virtual Solution for Microsoft SQL ServerVirtual Solution for Microsoft SQL Server
Virtual Solution for Microsoft SQL Server
webhostingguy
 

What's hot (13)

Database mirroring setup
Database mirroring setupDatabase mirroring setup
Database mirroring setup
 
Mirroring in SQL Server 2012 R2
Mirroring in SQL Server 2012 R2Mirroring in SQL Server 2012 R2
Mirroring in SQL Server 2012 R2
 
Geek Sync | Field Medic’s Guide to Database Mirroring
Geek Sync | Field Medic’s Guide to Database MirroringGeek Sync | Field Medic’s Guide to Database Mirroring
Geek Sync | Field Medic’s Guide to Database Mirroring
 
Ebook10
Ebook10Ebook10
Ebook10
 
Database Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David IzahkDatabase Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David Izahk
 
Moss 2007 virtualization_with_v_mware_infrastructure
Moss 2007 virtualization_with_v_mware_infrastructureMoss 2007 virtualization_with_v_mware_infrastructure
Moss 2007 virtualization_with_v_mware_infrastructure
 
Sql Server 2008 Enhancements
Sql Server 2008 EnhancementsSql Server 2008 Enhancements
Sql Server 2008 Enhancements
 
Kaashiv SQL Server Interview Questions Presentation
Kaashiv SQL Server Interview Questions PresentationKaashiv SQL Server Interview Questions Presentation
Kaashiv SQL Server Interview Questions Presentation
 
HPE NonStop SQL WebDBS - Introduction
HPE NonStop SQL WebDBS - IntroductionHPE NonStop SQL WebDBS - Introduction
HPE NonStop SQL WebDBS - Introduction
 
Sql2008 (1)
Sql2008 (1)Sql2008 (1)
Sql2008 (1)
 
Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...
Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...
Post-Install_Actions_and_Considerations_for_Oracle_WebLogic_Server_Patch_Set_...
 
Migrations and upgrades
Migrations and upgradesMigrations and upgrades
Migrations and upgrades
 
Virtual Solution for Microsoft SQL Server
Virtual Solution for Microsoft SQL ServerVirtual Solution for Microsoft SQL Server
Virtual Solution for Microsoft SQL Server
 

Viewers also liked

computer Hardware and Networking
computer Hardware and Networkingcomputer Hardware and Networking
computer Hardware and Networking
NIJESH BABU EDACHANA PANICHADATH
 
Computer networks--networking hardware
Computer networks--networking hardwareComputer networks--networking hardware
Computer networks--networking hardware
Mziaulla
 
Hardware and network project
Hardware and network projectHardware and network project
Hardware and network project
asif masoad
 
Computer hardware and networking components
Computer hardware and networking componentsComputer hardware and networking components
Computer hardware and networking components
Manpreet Singh Bedi
 
Computer networking devices
Computer networking devicesComputer networking devices
Computer networking devices
Rajesh Sadhukha
 
Mobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigitalMobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigital
Aleyda Solís
 

Viewers also liked (6)

computer Hardware and Networking
computer Hardware and Networkingcomputer Hardware and Networking
computer Hardware and Networking
 
Computer networks--networking hardware
Computer networks--networking hardwareComputer networks--networking hardware
Computer networks--networking hardware
 
Hardware and network project
Hardware and network projectHardware and network project
Hardware and network project
 
Computer hardware and networking components
Computer hardware and networking componentsComputer hardware and networking components
Computer hardware and networking components
 
Computer networking devices
Computer networking devicesComputer networking devices
Computer networking devices
 
Mobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigitalMobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigital
 

Similar to Cluster configuration best practices

Clustering overview2
Clustering overview2Clustering overview2
Clustering overview2
Vinod Hanumantharayappa
 
Windows Server 2003
Windows Server 2003Windows Server 2003
Windows Server 2003
Vinod Hanumantharayappa
 
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White PaperSql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
David J Rosenthal
 
Cluster arch
Cluster archCluster arch
DAG
DAGDAG
Introduction to Java Enterprise Edition
Introduction to Java Enterprise EditionIntroduction to Java Enterprise Edition
Introduction to Java Enterprise Edition
Abdalla Mahmoud
 
Getting the most out of your cloud deployment
Getting the most out of your cloud deploymentGetting the most out of your cloud deployment
Getting the most out of your cloud deployment
Jose Lopez
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10
kaashiv1
 
Microsoft SQL High Availability and Scaling
Microsoft SQL High Availability and ScalingMicrosoft SQL High Availability and Scaling
Microsoft SQL High Availability and Scaling
Justin Whyte
 
TermPaper
TermPaperTermPaper
TermPaper
Viral Savani
 
Clustering & nlb
Clustering & nlbClustering & nlb
Clustering & nlb
Vinod Hanumantharayappa
 
Introducing Azure Services Platform V1
Introducing Azure Services Platform V1Introducing Azure Services Platform V1
Introducing Azure Services Platform V1
guest120d945
 
Azure interview-questions-pdf
Azure interview-questions-pdfAzure interview-questions-pdf
Azure interview-questions-pdf
Whizlabs
 
MySQL Replication Performance in the Cloud
MySQL Replication Performance in the CloudMySQL Replication Performance in the Cloud
MySQL Replication Performance in the Cloud
Vitor Oliveira
 
Computing And Information Technology Programmes Essay
Computing And Information Technology Programmes EssayComputing And Information Technology Programmes Essay
Computing And Information Technology Programmes Essay
Lucy Nader
 
Lock-in issues with PaaS
Lock-in issues with PaaSLock-in issues with PaaS
Lock-in issues with PaaS
Federico Conte
 
Data Base
Data BaseData Base
Data Base
Susan Tullis
 
Cloud computing
Cloud computingCloud computing
Cloud computing
leninlal
 
Cloud computing
Cloud computingCloud computing
Cloud computing
sfu-kras
 
Sunserver Open Solaris
Sunserver Open SolarisSunserver Open Solaris
Sunserver Open Solaris
pankaj009
 

Similar to Cluster configuration best practices (20)

Clustering overview2
Clustering overview2Clustering overview2
Clustering overview2
 
Windows Server 2003
Windows Server 2003Windows Server 2003
Windows Server 2003
 
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White PaperSql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
 
Cluster arch
Cluster archCluster arch
Cluster arch
 
DAG
DAGDAG
DAG
 
Introduction to Java Enterprise Edition
Introduction to Java Enterprise EditionIntroduction to Java Enterprise Edition
Introduction to Java Enterprise Edition
 
Getting the most out of your cloud deployment
Getting the most out of your cloud deploymentGetting the most out of your cloud deployment
Getting the most out of your cloud deployment
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10
 
Microsoft SQL High Availability and Scaling
Microsoft SQL High Availability and ScalingMicrosoft SQL High Availability and Scaling
Microsoft SQL High Availability and Scaling
 
TermPaper
TermPaperTermPaper
TermPaper
 
Clustering & nlb
Clustering & nlbClustering & nlb
Clustering & nlb
 
Introducing Azure Services Platform V1
Introducing Azure Services Platform V1Introducing Azure Services Platform V1
Introducing Azure Services Platform V1
 
Azure interview-questions-pdf
Azure interview-questions-pdfAzure interview-questions-pdf
Azure interview-questions-pdf
 
MySQL Replication Performance in the Cloud
MySQL Replication Performance in the CloudMySQL Replication Performance in the Cloud
MySQL Replication Performance in the Cloud
 
Computing And Information Technology Programmes Essay
Computing And Information Technology Programmes EssayComputing And Information Technology Programmes Essay
Computing And Information Technology Programmes Essay
 
Lock-in issues with PaaS
Lock-in issues with PaaSLock-in issues with PaaS
Lock-in issues with PaaS
 
Data Base
Data BaseData Base
Data Base
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Sunserver Open Solaris
Sunserver Open SolarisSunserver Open Solaris
Sunserver Open Solaris
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 

Cluster configuration best practices

  • 1. Server Clusters : Cluster Configuration Best Practices For Windows Server 2003 Microsoft Corporation Published: January 2003 Abstract This document describes several recommended cluster configurations, or topologies, among the many that are possible for Microsoft® Windows® Server 2003. Each configuration is accompanied by a discussion of the relative advantages and disadvantages for various models of application deployment.
  • 2. Microsoft® Windows® Server 2003 Technical Article The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. © 2002. Microsoft Corporation. All rights reserved. Microsoft, SQL Server, Windows, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. Server clusters: Cluster Configuration Best Practices ii
  • 3. Microsoft® Windows® Server 2003 Technical Article Contents Contents ................................................................................................................................................................ iii Cluster and Application Deployments..................................................................................................4 Server Load............................................................................................................................................. 6 Server Load – The Basics.....................................................................................................................6 Application Style..................................................................................................................................10 Application Deployments.....................................................................................................................16 Failover Policies..................................................................................................................................17 Failover Pairs...................................................................................................................................17 Hot-Standby Server.........................................................................................................................18 N+I...................................................................................................................................................19 Failover Ring ..................................................................................................................................20 Random........................................................................................................................................... 21 Customized control..........................................................................................................................22 Load Balancing.....................................................................................................................................24 Server clusters: Cluster Configuration Best Practices iii
  • 4. Microsoft® Windows® Server 2003 Technical Article Cluster and Application Deployments Historically, we have described clusters and applications as being active/active or active/passive, which is a natural result of the fact that the earliest products supported only 2-node clusters. In some cases, however, this has led to confusion because the terms can be interpreted in different ways depending on whether the context is 1) how a given application instance is running, 2) how different instances of an application are running or 3) whether the nodes in the cluster are performing useful work. To fully understand the specific application and cluster deployments, we need to understand the following: • How many instances of the application are actually up and running? • How many instances of the same data are in the cluster? • Do instances of the data move around? • Do different instances of the same application run on different cluster nodes? • Can different applications run on different cluster nodes? • What kind of load does the application put on a server, and how will this load be re-distributed after a failover? Before describing how an application is deployed, we must first define what an application is. An application, for the purposes of this document, is defined as the running code and data required to provide a single service to an end-user or client. Consider a couple of different examples to illustrate the point: A single instance of Microsoft® Word running on a workstation is a single application instance. If there are multiple instances of Word running, each instance is considered a different application instance. There are, however, more complex applications. For example, with Microsoft SQL Server™, a single database is considered to be a single application instance. Independent databases are considered to be different application instances. A single database, however, can be partitioned into multiple SQL Server instances and “tied-together” using the SQL Server query engine. In this case, the set of SQL Server instances that are tied together to provide a single database image is considered a single application instance. Fundamentally, there are five different attributes that together provide a complete view of the deployment and can be used to reason about or characterize the deployment: • Server load – How much of a server’s resources are consumed by the applications it supports, and how will this resource utilization be affected by failover? • Application style – Is the application a single monolithic application that runs on a single node or is the complete application split into smaller pieces that can be deployed around the cluster? Server Clusters: Cluster Configuration Best Practices 4
  • 5. Microsoft® Windows® Server 2003 Technical Article • Application deployment – How are the pieces of the application spread around the cluster in a given deployment? • Failover policies – How is the application configured to behave after a failover? • Application implementation – How is the application itself implemented? Server Clusters: Cluster Configuration Best Practices 5
  • 6. Microsoft® Windows® Server 2003 Technical Article Server Load When deploying an application, it is always important to consider what demands it will make on a server’s resources. With clustering, there is a related issue that also needs to be taken into account: how is the load re-distributed after a failover? Server Load – The Basics Consider one of the simplest cases, an active/active, 2-node file server cluster, with node A and node B each serving a single share. If node A fails, its resources will move to node B, placing an additional load on node B. In fact, if node A and B were each running at only 50% capacity before the failure, node B will be completely saturated (100% of capacity) after the failover is completed, and performance may suffer. While this situation may not be optimal, it is important to remember that having all of the applications still running, even in a reduced performance scenario, is a 100% improvement over what you would have without the high availability protection that clusters provide. But this does bring up the notion of risk, and what amount of it you are willing to accept in order to protect the performance, and ultimately the availability, of your applications. We have intentionally chosen the worst case (an active/active, 2-node cluster with each node running a single application that consumes half of the server’s resources) for the purpose of clarity. With an additional node, the equation changes: there are more servers to support the workload, but if all three nodes are running at 50% capacity and there are two failures, the single remaining server will simply not be able to handle the accumulated load of the applications from both of the failed servers. Of course, the likelihood of two failures is considerably less than that of a single failure, so the risk is mitigated somewhat. Nevertheless, the load/risk tradeoff must be considered when deploying applications on a cluster. The more nodes in a cluster, the more options you have for distributing the workload. If your requirements dictate that all of your clustered applications must run with no performance degradation, then you may need to consider some form of active/passive configuration. But even in this scenario, you must consider the risks of the various configurations. If you cannot accept even the slightest risk of any reduced performance under any conditions whatsoever, you will need a dedicated passive node for each active node. If, on the other hand, you are convinced that the risk of multiple failures is small, you have other choices. If you have a 4-node, or 8-node cluster, you may want to consider an “N+I” configuration. N+I, which is discussed in more detail in section 1.4.3, is a variant of active/passive, where N nodes are active, and I nodes are passive, or reserve nodes. Typically, the value for I is less than the value for N, and an N+I cluster topology can handle I failures before any performance degradation is likely. The risk is that with Server Clusters: Cluster Configuration Best Practices 6
  • 7. Microsoft® Windows® Server 2003 Technical Article more than I failures, performance will likely decline, but once again, the likelihood of multiple failures is increasingly remote. For this reason, N+I clusters are a useful configuration that balances the hardware cost of having 100% passive server capacity against the low level risk of multiple cluster node failures. Server Load – Some More Realistic Configurations The scenarios above were intentionally simplistic, assuming that one application imposed a monolithic load on each server, and thus its resource utilization could not be spread among more than one other server in the event of a failover. That is often not the case in the real world, especially for file and print servers, so we will take a look at some additional scenarios with a 4-node cluster, named ABCD, and having nodes A, B, C, and D. Typically a single server will support the load of more than one application. If under normal conditions, each server was loaded at 25%, then the ABCD cluster could survive the loss of three members before a likely loss of application availability, which would be nearly a worst-case scenario. The following series of figures illustrates what would happen with the application load in a 4-node cluster for successive node failures. The shaded, or patterned, areas indicate the capacity demands of the running applications. Further, the example below assumes that the application load on any given server is divisible, and can be redistributed among any of the surviving nodes. Figure 1.1: Cluster under normal operating conditions (each node loaded at 25%) Server Clusters: Cluster Configuration Best Practices 7
  • 8. Microsoft® Windows® Server 2003 Technical Article Figure 1.2: Cluster after a single node failure. Note redistribution of application load. Figure 1.3: Cluster after two node failures. Each surviving node is now approximately 50% loaded. Figure 1.4: After three node failures, single surviving node is at full capacity. If each node were running at 75% capacity, then without sensible failover policies, even a single node failure could result in loss of application availability. Depending on the application(s), however, you can specify that, in the event of a server failure, some percentage of the applications should fail over to node A, node B, node C, and node D. If the applications are spread evenly among the surviving nodes, then this cluster could now survive the loss of a single machine, because one third of the failed server’s load (one third of 75% is 25%) is allocated to each of the surviving three machines. The result is three fully loaded servers (each of the nodes running at 75% capacity now have an additional 25%), but all applications are still available. As a variation on the example that was illustrated previously, the following series of figures will depict what happens to a 4-node cluster, each of which is running at approximately 33% capacity. Further, in Server Clusters: Cluster Configuration Best Practices 8
  • 9. Microsoft® Windows® Server 2003 Technical Article this case, the application load is indivisible, and can not be spread among multiple other servers in the event of a failover (perhaps it is a single application or multiple applications that depend on the same resource). Figure 2.1: Cluster under normal operating conditions (each node loaded at approximately 33%) Figure 2.2: Cluster after a single node failure. Note redistributed application load. Figure 2.3: Cluster after second node failure. Each surviving node is now running at approximately 66% capacity. Note that in the event of another node failure, this cluster will no longer be capable of supporting all four of these applications. Figure 2.4: After third failure, the single surviving server can only support three of the four applications. Server Clusters: Cluster Configuration Best Practices 9
  • 10. Microsoft® Windows® Server 2003 Technical Article Another style of failover policy is “Failover Pairs”, also known as “Buddy Pairs”. Assuming each of the four servers is loaded at 50% or less, a failover “buddy” can be associated with each machine. This allows a cluster in this configuration to survive two failures. More details on Failover Pairs can be found in section 1.4.1. Taking the previous Failover Pair example, we can convert it to an active/passive configuration by loading two servers at 100% capacity, and having two passive backup servers. This active/passive configuration can also survive two failures, but note that under ordinary circumstances, two servers remain unused. Furthermore, performance of these servers at 100% load is not likely to be as good as with the Failover Pair configuration, where each machine is only running at 50% utilization. Between these two examples, note that you have the same number of servers, the same number of applications, and the same survivability (in terms of how many nodes can fail without jeopardizing application availability). However, the Failover Pair configuration clearly comes out ahead of active/passive in terms of both performance and economy. Still using our ABCD 4-node cluster, consider what happens if we configure it as an N+I cluster (explained in more detail in section 1.4.3) where three nodes are running at 100% capacity, and there is a single standby node. This cluster can only survive a single failure. As before, however, comparing it to the example where each server is running at 75% capacity, you again have the same number of servers, applications, and same survivability, but the performance and economy can suffer when you have passive servers backing up active servers running at 100% load. Application Style Now it is time to consider some aspects of the application’s design, and how that affects the way it is deployed. Applications running in a cluster can be characterized as one of: • Single Instance In a single instance application, only one application instance is running in the cluster at any point in time. An example of this is the DHCP service in a server cluster. At any point in time, there is only one instance of the DHCP service running in a cluster. The service is made highly available using the failover support provided with server clusters. Single instance applications typically have state that cannot be partitioned across the nodes. In the case of DHCP, the set of IP addresses that have been leased is relatively small, but potentially highly volatile in a large environment. To avoid the complexities of synchronizing the state around the cluster, the service runs as a single instance application. Server Clusters: Cluster Configuration Best Practices 10 App Only one instance of the application is running in the cluster at any one time Cluster
  • 11. Microsoft® Windows® Server 2003 Technical Article Figure 3: Single instance application In the example above, the cluster has four nodes. The single instance application, by definition, can be running on only one node at a time. • Multiple Instance A multiple instance application is one where multiple instances of the same code or different pieces of code that cooperate to provide a single service can be executing around the cluster. Together these instances provide the illusion of a single service to an end-user or client computer. There are two different types of multiple instance applications that depend on the type of application and data: Server Clusters: Cluster Configuration Best Practices 11
  • 12. Microsoft® Windows® Server 2003 Technical Article  Cloned Application. A cloned application is a single logical application that consists of two or more instances of the same code running against the same data set. Each instance of the application is self-contained, thus enabling a client to make a request to any instance of the application with the guarantee that regardless of the instance, the client will receive the same result. A cloned application is scalable since additional application instances can be deployed to service client requests and the instances can be deployed across different nodes, thus enabling an application to grow beyond the capacity of a single node. By deploying application instances on different nodes, the application is made highly available. In the event that a node hosting application instances fails, there are instances running on other nodes that are still available to service client requests. Typically, client requests are load-balanced across the nodes in the cluster to spread the client requests amongst the application instances. Applications that run in this environment do not have long-running in-memory state that spans client requests, since each client request is treated as an independent operation and each request can be load-balanced independently. In addition, the data set must be kept consistent across the entire application. As a result, cloning is a technique typically used for applications that have read-only data or data that is changed infrequently. Web server front-ends, scalable edge-servers such as arrays of firewalls and middle-tier business logic fall into this category. While these applications are typically called stateless applications1 they can save client- specific session-oriented state in a persistent store that is available to all instances of the cloned application, however, the client must be given a token or a key that it can present with subsequent requests so that whichever application instance services each request can associate the request with the appropriate client state. The Microsoft Windows® platform provides Network Load Balancing as the basic infrastructure for building scale-out clusters with the ability to spray client requests across the nodes. Application Center provides a single image view of management for these environments. 1 This is really a misnomer, the applications have state, however, the state does not span individual client requests. Server Clusters: Cluster Configuration Best Practices 12 A-Z App Cluster App App App A-Z A-Z A-Z
  • 13. Microsoft® Windows® Server 2003 Technical Article Figure 4: Cloned Application In the example above, the same application “App” is running on each node. Each instance of the application is accessing the same data set, in this case the dataset A-Z. This example show each instance having access to its own data set (created and kept consistent using some form of staging or replication technique), some applications can share the same data set (where the data is available to the cluster using a file share for example). Figure 5: Cloned Application using a file share  Partitioned Applications. Applications that have long-running in-memory state or have large, frequently updated data sets cannot be easily cloned. These applications are typically called stateful applications. The cost of keeping the data consistent across many instances of the application would be prohibitive. Server Clusters: Cluster Configuration Best Practices 13 A-Z App Cluster App App App File Server sharing out the data set A-Z
  • 14. Microsoft® Windows® Server 2003 Technical Article Fortunately, however, many of these applications have data sets or functionality that can be readily partitioned. For example, a large file server can be partitioned by dividing the files along the directory structure hierarchy or a large customer database can be partitioned along customer number or customer name boundaries (customers from A to L in one database, customers from M to Z in another database, for example). In other cases, the functionality can be partitioned or componentized. Once an application is partitioned, the different partitions can be deployed across a set of nodes in a cluster, thus enabling the complete application to scale beyond the bounds of a single node. In order to present a single application image to the clients, a partitioned application requires an application-dependent decomposition, routing and aggregation mechanism that allows a single client request to be distributed across the set of partitions and the results from the partitions to be combined into a single response back to the client. For example, take a partitioned customer database, a single SQL query to return all of the accounts that have overdue payments requires that the query be sent to every partition of the database. Each partition will contain a subset of customer records that must be combined into a single data set to be retuned to the client. Partitioning an application allows it to scale but does not provide high availability. If a node fails that is hosting a partition of the application, that piece of the data set or that piece of functionality is no longer accessible. Partitioning is typically application-specific, since the aggregation mechanism is dependent of the type of application and the type of data returned. SQL Server, Exchange data stores and DFS are all examples of applications and services that can be partitioned for scalability. Figure 6: Partitioned Application – Data Partitioning In the example above, each node is running multiple instances of the same application against different pieces of the complete data set. A single request from a client application can span multiple instances, for example a query to return all the records in the database. This splitting of the client across the different application instances and the aggregation of the Server Clusters: Cluster Configuration Best Practices 14 A-F App Cluster App App App G-M N-R S-Z
  • 15. Microsoft® Windows® Server 2003 Technical Article single result passed back to the client can be done either by the applications on the server cooperating amongst themselves (for example the SQLQuery engine) or it can be done by the client (as is the case with Exchange 2000 data stores). Applications may also be partitioned along functional lines as well as data sets. Figure 7: Partitioned Application - Functional Partitions In the above example, each node in the cluster is performing a different function; however, the cluster together provides a single, uniform service. One node is providing a catalog service; one is providing a billing service, etc. The cluster, though, provides a single “book buying service” to clients. Computational clusters are built to support explicitly massively parallel applications. In other words, the applications are specifically written to decompose a problem into many (potentially thousands) of sub-operations and execute them in parallel across a set of machines. This type of application is also a partitioned application. Computational clusters typically provide a set of libraries to provide cluster-wide communication and synchronization primitives tailored Server Clusters: Cluster Configuration Best Practices 15 Book Service Cluster Catalog Billing Credit Check Client Requests
  • 16. Microsoft® Windows® Server 2003 Technical Article to the environment (MPI is one such set of libraries). These clusters have been termed Beowulf clusters. Microsoft has created the High Performance Computing initiative to provide support for this type of cluster. Application Deployments Applications can be deployed in different ways across a cluster depending on what style of application is being deployed and the number of applications that are being deployed on the same cluster. They can be characterized as follows: 1) One, single instance application In this type of deployment, one node of the cluster is executing the application so that it can service user requests, the other nodes are in stand-by, waiting to host the application in the event of failure. This type of deployment is typically suited to 2-node failover clusters and is not typically used for N-node clusters since only 1/N of the total capacity is being used. This is what some people term an active/passive cluster. 2) Several, single instance applications Several single instance applications can be deployed on the same cluster. Each application is independent from the others. In a failover cluster, each application can failover independently for the others, so in the case of a node hosting multiple applications, if that node fails, the different applications may failover to different nodes in the cluster. This type of deployment is typically used for consolidation scenarios where multiple applications are hosted on a set of nodes and the cluster provides a highly available environment. In this environment, careful capacity planning is required to ensue that in the event of failures, the nodes have sufficient capacity (CPU, memory IO bandwidth etc.) to support the increased load. 3) A single, multiple instance application In this environment, while the cluster is only supporting one application, various pieces of the application are running on the different nodes. In the case of a cloned application, each node is running the same code against the same data. This is the typical web-front end scenario (all of the nodes in the cluster are identical). In the event of a failure, the capacity is reduced. In a partitioned application, the individual partitions are typically deployed across the nodes in the cluster. If a failure occurs, multiple partitions may be hosted on a single node. Careful planning is required to ensure that, in the event of a failure, the application SLA is achieved. Take a 4-node cluster as an example. If an application is partitioned into four pieces, then in normal running, each node would host a partition. In the event that one node failed, two nodes would continue to host one partition and the remaining node would end up hosting two partitions, giving the 3rd node twice the load, potentially. If the application were split into 12 partitions, each node would host three partitions in the normal case. In the event of a failure, each partition could be configured to failover to a different node, thereby; the remaining nodes could each host four partitions, spreading the load in the event of a failure. The cost however, is that support for 12 partitions may be more overhead than for four partitions. Server Clusters: Cluster Configuration Best Practices 16
  • 17. Microsoft® Windows® Server 2003 Technical Article 4) Several, multiple instance applications Of course, several multiple instance applications may be deployed on the same cluster (indeed single instance and multiple instances may be deployed on the same cluster). In the case of multiple cloned applications, each node is simply running one instance of each application. In the case of a partitioned application, capacity planning as well as defining failover targets becomes increasingly complex. Failover Policies Failover is the mechanism that single instance applications and the individual partitions of a partitioned application typically employ for high availability (the term Pack has been coined to describe a highly available, single instance application or partition). In a 2-node cluster, defining failover policies is trivial. If one node fails, the only option is to failover to the remaining node. As the size of a cluster increases, different failover policies are possible and each one has different characteristics. Failover Pairs In a large cluster, failover policies can be defined such that each application is set to failover between two nodes. The simple example below shows two applications App1 and App2 in a 4-node cluster. Figure 8: Failover pairs This configuration has pros and cons: Server Clusters: Cluster Configuration Best Practices 17 Node 4 Node 1 App1 Node 2 Node 3 Cluster App2 App2 can Failover to Node 4 in the event of failure of Node 3 App1 can Failover to Node 2 in the event of failure of Node 1
  • 18. Microsoft® Windows® Server 2003 Technical Article  Good for clusters that are supporting heavy-weight2 applications, such as databases. This configuration ensures that in the event of failure, two applications will not be hosted on the same node.  Very easy to plan capacity. Each node is sized based on the application that it will need to host (just like a 2-node cluster hosting one application).  Effect of a node failure on availability and performance of the system is very easy to determine.  Get the flexibility of a larger cluster. In the event that a node is taken out for maintenance, the “buddy” for a given application can be changed dynamically (may end up with standby policy below).  In simple configurations, such as the one above, only 50% of the capacity of the cluster is in use.  Administrator intervention may be required in the event of multiple failures. Failover pairs are supported by server clusters on all versions of Windows by limiting the possible owner list for each resource to a given pair of nodes. Hot-Standby Server To reduce the overhead of failover pairs, the “spare” node for each pair may be consolidated into a single node, providing a hot standby server that is capable of picking up the work in the event of a failure. Figure 9: Standby Server The standby server configuration has pros and cons: 2 A heavy-weight application is one that consumes a significant number of system resources such as CPU, memory or IO bandwidth. Server Clusters: Cluster Configuration Best Practices 18 Node 1 App1 Node 2 Node 3 Cluster App2 App2 can Failover to Node 2 in the event of failure of Node 3 App1 can Failover to Node 2 in the event of failure of Node 1
  • 19. Microsoft® Windows® Server 2003 Technical Article  Good for clusters that are supporting heavy-weight applications such as databases. This configuration ensures that in the event of a single failure, two applications will not be hosted on the same node.  Very easy to plan capacity. Each node is sized based on the application that it will need to host, the spare is sized to be the maximum of the other nodes.  Effect of a node failure on availability and performance of the system is very easy to determine.  Configuration is targeted towards a single point of failure.  Does not really handle multiple failures well. This may be an issue during scheduled maintenance where the spare may be in use. Server clusters support standby servers today using a combination of the possible owners list and the preferred owners list. The preferred node should be set to the node that the application will run on by default and the possible owners for a given resource should be set to the preferred node and the spare node. N+I Standby server works well for 4-node clusters in some configurations, however, its ability to handle multiple failures is limited. N+I configurations are an extension of the standby server concept where there are N nodes hosting applications and I nodes which are spares. Figure 10: N+I Spare node configuration N+I configurations have the following pros and cons: Server Clusters: Cluster Configuration Best Practices 19 Node 1 App1 Node 2 Node 3 Cluster App2 Any application instance can failover to any spare node Node 4 Node 5 Node 6 App3 App4
  • 20. Microsoft® Windows® Server 2003 Technical Article  Good for clusters that are supporting heavy-weight applications such as databases or Exchange. This configuration ensures that in the event of a failure, an application instance will failover to a spare node, not one that is already in use.  Very easy to plan capacity. Each node is sized based on the application that it will need to host.  Effect of a node failure on availability and performance of the system is very easy to determine.  Configuration works well for multiple failures.  Does not really handle multiple applications running in the same cluster well. This policy is best suited to applications running on a dedicated cluster. Server clusters supports N+I scenarios in the Windows Server 2003 release using a cluster group public property AntiAffinityClassNames. This property can contain an arbitrary string of characters. In the event of a failover, if a group being failed over has a non-empty string in the AntiAffinityClassNames property, the failover manager will check all other nodes. If there are any nodes in the possible owners list for the resource that are NOT hosting a group with the same value in AntiAffinityClassNames, then those nodes are considered a good target for failover. If all nodes in the cluster are hosting groups that contain the same value in the AntiAffinityClassNames property, then the preferred node list is used to select a failover target. Failover Ring Failover rings allow each node in the cluster to run an application instance. In the event of a failure, the application on the failed node is moved to the next node in sequence. Server Clusters: Cluster Configuration Best Practices 20 Node 4 Node 1 App1 Node 2 Node 3 Cluster App2 App2 can Failover to Node 3 in the event of failure of Node 2 App1 can Failover to Node 2 in the event of failure of Node 1 App3App4 App3 can Failover to Node 4 in the event of failure of Node 3 App4 can Failover to Node 1 in the event of failure of Node 4
  • 21. Microsoft® Windows® Server 2003 Technical Article Figure 11: Failover Ring This configuration has pros and cons:  Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.  Effect on performance of a node failure is easy to predict.  Easy to plan capacity for a single failure.  Configuration does not work well for all cases of multiple failures. If Node 1 fails, Node 2 will host two application instances and Nodes 3 and 4 will host one application instance. If Node 2 then fails, Node 3 will be hosting three application instances and Node 4 will be hosting one instance.  Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes. Failover rings are supported by server clusters on the Windows Server 2003 release. This is done by defining the order of failover for a given group using the preferred owner list. A node order should be chosen and then the preferred node list should be set up with each group starting at a different node. Random In large clusters or even 4-node clusters that are running several applications, defining specific failover targets or policies for each application instance can be extremely cumbersome and error prone. The best policy in some cases is to allow the target to be chosen at random, with a statistical probability that this will spread the load around the cluster in the event of a failure. Server Clusters: Cluster Configuration Best Practices 21
  • 22. Microsoft® Windows® Server 2003 Technical Article Random failover policies have pros and cons:  Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.  Does not require an administrator to decide where any given application should failover to.  Provided that there are sufficient applications or the applications are partitioned finely enough, this provides a good mechanism to statistically load-balance the applications across the cluster in the event of a failure.  Configuration works well for multiple failures.  Very well tuned to handling multiple applications or many instances of the same application running in the same cluster well.  Can be difficult to plan capacity. There is no real guarantee that the load will be balanced across the cluster.  Effect on performance of a node failure is not easy to predict.  Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes. The Windows Server 2003 release of server clusters randomizes the failover target in the event of node failure. Each resource group that has an empty preferred owners list will be failed over to a random node in the cluster in the event that the node currently hosting it fails. Customized control There are some cases where specific nodes may be preferred for a given application instance. A configuration that ties applications to nodes has pros and cons:  Administrator has full control over what happens when a failure occurs.  Capacity planning is easy, since failure scenarios are predictable.  With many applications running in a cluster, defining a good policy for failures can be extremely complex.  Very hard to plan for multiple, cascaded failures. Server clusters provide full control over the order of failover using the preferred node list feature. The full semantics of the preferred node list can be defined as: Server Clusters: Cluster Configuration Best Practices 22
  • 23. Microsoft® Windows® Server 2003 Technical Article Preferred Node List Move group to “best possible” initiated via administrator Failover due to node or group failure Contains all nodes in clusterGroup is moved to highest node in preferred node list that is up and running in the cluster. Group is moved to the next node on the preferred node list. Contains a subset of the nodes in the cluster Group is moved to highest node in preferred node list that is up and running in the cluster. If no nodes in the preferred node list are up and running, the group is moved to a random node. Group is moved to the next node on the preferred node list. If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node. Empty Group is moved to a random node.Group is moved to a random node. Server Clusters: Cluster Configuration Best Practices 23
  • 24. Microsoft® Windows® Server 2003 Technical Article Load Balancing Load balancing is a term that means many different things. Within a cluster environment, load balancing is typically used in two different ways: • Spreading client requests across a set of clones – In Windows, Network Load Balancing (NLB) provides the capability to spread client TCP/IP requests across a set of nodes (typically Web servers or ISA arrays or terminal server machines). This functionality can also be provided by external network infrastructure such as CISCO local director. • Optimizing the compute resources across a failover cluster – A cluster that supports failover allows application instances to “move” around the cluster. Historically this has been used to ensure that on failure, the application can be restarted elsewhere. However, using this same mechanism could be used with performance counters and application load metrics to provide a way to balance the load of applications across the nodes in a cluster. This can be done in one of two ways: o Manual load balancing: Each group in a cluster can be moved around independently of the other cluster groups. By breaking down applications into fine- granularity groups where it is possible, manual load balancing can be done using the cluster administration tools. If the load on a given node is high relative to the other nodes in the cluster, one or more groups can be manually moved to other cluster nodes. o Dynamic load balancing: Today, Server clusters do not provide any load balancing mechanism (beyond a randomized failover policy in Windows Server 2003). Dynamic load balancing is an extremely complex issue, while moving an application may at first appear to make sense, the cost of that move can be hard to determine. For example, moving SQL Server may not only incur the cost of having to re-mount the database, but it also loses the in-memory cache and state which may have a significant impact on performance from the client perspective. Server Clusters: Cluster Configuration Best Practices 24