SlideShare a Scribd company logo
1 of 21
Download to read offline
Page 1 9/11/2012
2 © Copyright IBM Corporation, 2012
“z/OS Multi-Site Business Continuity”
September, 2012
Robert F. Kern
E-mail: BOBKERN@US.IBM.COM,
Page 2 9/11/2012
2 © Copyright IBM Corporation, 2012
Notices
Copyright © 2012 by International Business Machines Corporation.
No part of this document may be reproduced or transmitted in any form without written
permission from IBM Corporation.
The information provided in this document is distributed “AS IS” without any warranty,
either express or implied. IBM EXPRESSLY DISCLAIMS any warranties of
merchantability, fitness for a particular purpose OR INFRINGEMENT.
IBM shall have no responsibility to update this information.
IBM products are warranted according to the terms and conditions of the agreements (e.g.,
IBM Customer Agreement, Statement of Limited Warranty, International Program
License Agreement, etc.) under which they are provided. IBM is not responsible for the
performance or interoperability of any non-IBM products discussed herein.
The provision of the information contained herein is not intended to, and does not; grant
any right or license under any IBM patents or copyrights. Inquiries regarding patent or
copyright licenses should be made, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
USA
Trademarks
The following trademarks may appear in this Paper.
AIX, AS/400, DS8000, Enterprise Storage Server, Enterprise Storage Server Specialist,
ESCON, FICON, FlashCopy, Geographically Dispersed Parallel Sysplex, HyperSwap,
IBM, iSeries, OS/390, RMF, System/390, S/390, Tivoli, TotalStorage, z/OS, and zSeries
are trademarks of International Business Machines Corporation or Tivoli Systems Inc.
Other company, product, and service names may be trademarks or registered trademarks
of their respective companies.
Page 3 9/11/2012
2 © Copyright IBM Corporation, 2012
Abstract
Clients look for ways to reduce their TCO, simplify operations, and provide better service
to their customers. A trend in the area of Business Continuity today is that more and
more clients are looking to develop multi-site Continuous Operations and D/R strategies,
with the idea of switching which site production runs at on a regular basis. The concept
of toggling between sites or doing site flip/flops is gaining more scrutiny. Most clients
today who exploit toggling between sites do so with the full GDPS/PPRC HyperSwap
functionality deployed with a Multi-Site Workload. This configuration provides the
ability to perform switch sites in real time, with minimal interruption to the business.
Another emerging trend is for clients with out of region data centers to start examining
how they might also best accomplish the same business objective of switching sites,
while minimizing the impact to their business during the site switch operation.
Page 4 9/11/2012
2 © Copyright IBM Corporation, 2012
Introduction
This paper explores the various GDPS configuration deployments that clients have
implemented to provide high availability/continuous operations locally and/or out of
region disaster recovery protection. It also explores the trend towards trying to reduce
D/R testing costs by moving toward a ‘regular site switch’ or ‘site toggle’ model. To do
this, the paper will examine each of these aspects:
2 sites within metro/sysplex distance
active/active (multi-site workload) with HyperSwap and parallel
sysplex exploitation - non-disruptive flip/flop
active/standby (single site workload) with HyperSwap and parallel
sysplex exploitation - non-disruptive flip/flop possible with appropriate
configuration and temporary performance impact. Applications that do
not exploit sysplex incur an outage during the site move. Disruptive
site switches are typically automated to minimize the outage duration.
2 sites beyond metro/sysplex distance or using asynchronous data replication.
Disruptive switch but automated to minimize outage duration
Active/Standby – DB2 & IMS Application Disaster/Recovery at
distance. Two separate Sysplexes at distance with application
level Active/Standby across the two Sysplexes utilizing application
specific software based data replication technology.
3 Site Configurations & benefits.
Future vision.
The traditional two site model provides for Site 1 as the “primary production” site
and Site 2 as the “backup or remote recovery” site. The regular site toggle model is a
peer to peer relationship model where production can run at either site and switching
sites for “business reasons” on a regular basis becomes the business norm. An
active/active model, that enables a site switch with minimal performance impacts, can
be realized by clients through the following:
sysplex enabled applications
deployment of a multi-site workload under GDPS/PPRC with HyperSwap.
Duplication of all site resources across the two sites.
As distances between sites increase, data replication must switch from synchronous to
asynchronous techniques to avoid application performance impacts. In addition,
parallel sysplex distances are typically determined by the acceptable CF Link
performance for the various applications as well as the maximum STP Timer distances
(200km maximum). With these types of configurations a site switch is possible, but
Page 5 9/11/2012
2 © Copyright IBM Corporation, 2012
an automated sysplex wide IPL is required. End to end automation like GDPS can
minimize the outage time to perform the site switch.
This paper will discuss trends and directions in this arena for z/OS.
High Availability/Continuous Operations & Out of Region Disaster
Protection
IT Infrastructure Availability can be broken down into three pieces; High
Availability, Continuous Operations and Disaster/Recovery. Each brings unique client
requirements to clients when addressing Business Continuity. Through an
understanding of the client business requirements in this arena, IBM can help tailor the
right solution at the right cost point for any IT infrastructure.
6 © 2009 IBM Corporation Copyright IBM 2009
Business Continuity - Aspects of Availability
High Availability
Fault-tolerant, failure-
resistant infrastructure
supporting continuous
application processing
Continuous Operations
Non-disruptive backups and
system maintenance coupled
with continuous availability of
applications
Disaster Recovery
Protection against
unplanned outages
such as disasters
through reliable,
predictable recovery
Protection of critical business data
Recovery is predictable and reliable
Operations continue after a disaster
Costs are predictable and manageable
GDPS Solutions Overview
Page 6 9/11/2012
2 © Copyright IBM Corporation, 2012
GDPS (Geographically Disperses Parallel Sysplex) shipped originally in 1998 and
introduced the concept of multi-site IT Infrastructure resource management, for the
Sysplex. GDPS automation enhances the z/OS base sysplex and parallel sysplex
management to an end to end “server, workload, and data, with a coordinated network
switch” solution of resource management within the same or across multiple sites
providing continuous operations for clients. To accomplish this, GDPS automation
interfaces with many different System z hardware & software interfaces to reduce the
necessity of skilled personnel to perform various operations during a site switch. Some of
these interfaces include:
System z Hardware Management Console (HMC) to manage the System z
hardware reconfigurations dynamically. (ex. CBU, Expend Lpars, System IPLs,
etc.)
Sysplex & STD Timer interfaces,
CF Duplexing Interfaces
DS8000 Data Replication Functions – FlashCopy, z/OS Global Mirror(XRC),
Metro Mirror (PPRC), and Global Mirror
Various z/OS System Interfaces
z/OS integrated with various DS8000 Synergy items.
GDPS is storage vendor independent as all major storage vendors on the System z
platform can participate in solutions using their implementation of the IBM DS8000 Disk
Storage Subsystem data replication architecture of Metro Mirror, FlashCopy and zGM
(XRC). New features and functions are developed with the IBM Systems Storage team
on the DS8000. IBM sells the Host to Storage Subsystem “architecture” to the other
storage vendors. Those vendors then implement the feature/function on their disk
subsystems based on the Host to Disk Storage Subsystem architected interfaces. So, the
disk storage subsystem internal processing for a feature or function may be different from
one vendor to another. Depending on the specific feature/function there generally is
some time where the feature/function is only available on the DS8000. One should
consult with each storage vendor to understand specific feature/function support for any
DS8000 storage subsystem enhancement.
In addition, the GDPS automation inter-operates with all major system automation
packages available for System z.
Relative to Business Resiliency/Business Continuity, IBM’s Flagship product is GDPS.
GDPS comes in a variety of different flavors/solutions. The following two charts
illustrate the various solutions.
Page 7 9/11/2012
2 © Copyright IBM Corporation, 2012
GDPS provides an entry level solution called GDPS HyperSwap Manager, focused on
providing the HyperSwap availability solution for z/OS on the same data center floor or
across two local area data centers up to 200km with Parallel Sysplex.
GDPS/PPRC HyperSwap is the Full Function version of HyperSwap Manager, which can
be easily upgrade to. The full function GDPS/PPRC HyperSwap supports zVM and
zLinux data along with z/OS data. In addition to masking disk subsystem failures the full
function version, exploits parallel SYSPLEX to mask CEC failures, persistent sessions to
coordinate a network switch, CF Duplexing to manage CF structure failures and VTS PtP
to mask tape subsystem failures. Finally, if the failures evolve into a disaster scenario,
GDPS provides a complete end to end site failover/fallback capability for both planned
and unplanned site switches. One mouse click and the server, data, workload and a
coordinated network site switch are performed via automation. All data is recovered, the
SYSPLEX IPL’ed, data bases restarted followed by the applications. Skilled personnel
are no longer required to get the Sysplex up and running in the event of a disaster.
GDPS/GM (System z & Open Systems data) & GDPS/XRC (z/OS & zLinux only)
provide site failover/failback (FO/FB), typically “out of region” exploiting IBM’s Global
Mirror and zGM (XRC) data replication technologies.
GDPS/MzGM and GDPS/MGM provide a combination of high availability/continuous
operations locally coupled with out of region D/R protection. All GDPS solutions are
fully automated, proven, auditable, and in the case of PPRC and zGM (XRC) storage
vendor independent!
Page 8 9/11/2012
2 © Copyright IBM Corporation, 2012
The various GDPS solutions also support zVM and zLinux data through a feature call
x/DR.
The GDPS System z umbrella also includes the ability for GDPS automation to inter-
operate with System p, x, i (Linux), Windows, HP and Sun through the GDPS/DCM
(Distributed Cluster Manager) automation “inter-operability code” feature that works in
conjunction with Tivoli System Automation Application Manager (SA Appman) and/or
the Symantec Vertias Cluster Server Solutions. With GDPS and the x/DR and/or DCM
features, a single mouse click can yield a coordinated site failover/fall back of all of the
customers systems. (ex. System z (z/OS, zLinux, zVM) coordinated with say System p
AIX systems). The disk replication functions can be managed separately with the GDPS
and DCM automation or together, depending on the clients requirements for cross
platform data consistency.
GDPS is build upon the IBM DS8000 Storage based data replication architecture for
FlashCopy, Metro Mirror, z/OS Global Mirror and Global Mirror. As new features and
functions are implemented in the DS8000, GDPS automation is modified to exploit those
features and functions. In addition, GDPS supports various DS8000 base box features
used in conjunction with the various advanced functions.
IBM DS8000 Metro Mirror and Global Mirror support a function known as ‘Open Lun
Support’, such that through an ECKD device address, GDPS automation is able to
manage the Metro Mirror and/or Global Mirror functions for a distributed system Lun(s).
This is also true for Metro Global Mirror configurations. With the Open Lun support,
GDPS can provide a single restart point across the platforms. More systems and data
replication alternatives will continue to be provided in the future based on client
requirements. This is especially important for clients that have Multi-Platform
Applications where transactions are for example initially received by a Windows system,
then routed to say an AIX system and then to the “backend” z/OS System. Each system
may save data and as a result to recover the “application”, multiple platforms must be
recovered to the same point in time. GDPS inter-operability with Tivoli AppMan and/or
Symantec Veritas Cluster Server can provide such a solution for clients.
Page 9 9/11/2012
2 © Copyright IBM Corporation, 2012
Open Lun Support is also important for clients with applications like SAP where the user
interfaces are typically on non System z platforms and the backend data base runs on
z/OS. In some cases clients have moved the application’s parts that were running on non-
System z platforms to zLinux, but many clients resist introducing the risk of any change
to critical production applications that have been running for some time. Open Lun
Support can provide a data consistency solution for multi-platform application(s). All
data is recovered to a single point in time enabling each platform’s data base to perform a
data base Restart operation instead of a data base Recover operation when a site switch
occurs. The data base restart process manages all “in flight” and “in doubt” transactions,
which in turn permits the application(s) parts spread across the different platforms to
resume processing from the restarted point in time forward. GDPS automation when
combined with the DCM automation feature can inter-operate across the enterprise to
provide a complete business solution for clients in the area of IT business continuity.
This critical business function is made possible by the DS8000 ‘open Lun support’.
Page 10 9/11/2012
2 © Copyright IBM Corporation, 2012
Two Local Data Centers - 2 sites within metro/sysplex distance
The full GDPS/PPRC HyperSwap implementation can be configured as an active/active
“multi-site workload’ or active/standby “single-site workload” providing real time
planned and unplanned site switches mode through the deployment the following
features/functions:
- Parallel Sysplex – permits the movement of a workload from one processor at site 1 to an
alternate CEC in site 2.
- Sysplex enabled Applications. (required for multi-site workloads)
- HyperSwap – permits the ability for disk access to switch from a Metro Mirror Primary
volume(s) to the target volume(s) and reverse the mirror without an IPL of the parallel Sysplex.
- VTS Peer to Peer Tape configuration permits real time tape mirroring across multiple
physical Tape libraries without interrupting operations.
- Multiple Sysplex Timers permit timer switches in real time.
- CF Duplexing permits the switching of data structure access in real time.
- The concept of persistent sessions enables real time network switches.
Some customer applications have affinities. (e.g., all transactions for a given type must
be routed to a specific system, one transaction passes information onto the next
transaction, etc.). A sysplex enabled application requires that all affinities be removed
so a transaction can be routed to & execute on any clone of the application on any
system in the sysplex. When this is done, the application can then be run in an
active/active, multi-site workload configuration. Transactions can be distributed to run
on any system within the Sysplex, independent of their physical location.
Through the GDPS automation, more and more clients perform both Planned and
Unplanned site switches on a regular basis. Planned site switches are used to minimize
the production risks associated with site or equipment maintenance. Once a lights out
data center opens its doors for maintenance operations, the possibility exists for
production impacts. These can be minimized by switching production to the alternate
site in real time with a multi-site workload configuration. Providing the ability for a
client to exploit this type of operational functionality has spurred clients to think of new
approaches and new business exploitations of the technology.
Page 11 9/11/2012
2 © Copyright IBM Corporation, 2012
38 IBM Systems© 2008 IBM Corporation
GDPS/PPRC: a Continuous Availabilty and/or
Disaster Recovery Solution
- Metropolitan Distance
SITE 1
NETWORK
SITE 2
NETWORK
1
12
2
3
4
5
6
7
8
9
10
11
1
12
2
3
4
567
8
9
10
11
Manages Multi-Site Parallel Sysplex,
Processors, CBU, CF, Couple Data Sets
Manages Disk RC (System z & open LUN)
Manages Tape Remote Copy (PtPVTS)
Exploits Hyperswap & FlashCopy Function
Automated planned and unplanned actions
(z/OS, CF, disk, tape, site)
Improves availability of heterogeneous
System z business operations
Planned and unplanned exception conditions
The above diagram shows a high-level view of the GDPS/PPRC topology. The physical
topology of a GDPS/PPRC consists of a base or Parallel Sysplex cluster spread across
two sites (known as site 1 and site 2) with one or more z/OS systems at each site,
separated by up to 200 kilometers (km). The multi-site sysplex must be configured with
redundant hardware (e.g., a Coupling Facility and a Sysplex Timer in each site) and the
cross site connections (typically dedicated or ‘dark’ fibre) must be redundant. All critical
data is mirrored from the primary site (site 1 in this diagram) to the secondary site (site 2).
All Shared CF structures are located on the primary site coupling facilities. Therefore,
when transactions are executed on the processors at the remote site, disk I/O and Shared
CF structure access is through links from the secondary site to the primary site and the
disk I/O and CF structure updates are then mirrored in a synchronous manner back to the
remote site. This adds additional overheads to the applications disk I/O as well as any
access to shared CF structures. Before a customer elects to deploy a multi-site
configuration, he must first insure that his applications are sysplex enabled after which
careful consideration must be given to the system & application performance impacts of
these two accesses when a transaction is executed at the remote site. In many cases the
application performance impact will limit the effective distance that an active/active
configuration can actually sustain.
For disk I/O the performance impact of Metro Mirror rule of thumb:
1. Disk Subsystem overhead of MM at zero distance + (plus)
Page 12 9/11/2012
2 © Copyright IBM Corporation, 2012
2. speed of light through dedicated “dark” fibre for a single protocol exchange (linear function
of 1ms/100km or .1ms/10km) x (times)
3. the # of protocol exchanges implemented in the specific MM disk to disk implementation (for
IBM DS8000 MM, a single protocol exchange is accomplished through a feature called pre-
deposit write) + (plus)
4. other device overheads that may be on the fibre path. (ex. Switches, DWDMs, compression
and/or encryption devices, channel extenders, etc.)
For CF single latency rule of thumb:
Signal latency impact (round trip) = 10 US/KM * fiber distance KM * # of protocol
exchanges
Example: assume two sites separated by 10 KM and a processor in site 1 is accessing disk in
site 2, signal latency impact = 10 US/KM * 10 KM * 1 (FICON has one protocol exchange) or
100 US impact
Terminology:
► Kilometer (KM) – one KM equals 5/8 mile
► Millisecond (MS) – 10**-3
► Microsecond (US) – 10**-6
For most clients, the impact of CF single latency beyond 40-50 km (25-30 miles) yields
too great of application impact. Because of this, GDPS/PPRC multi-site implementations
typically tend to be campus or metro distances.
If customer applications are not sysplex enabled and/or the application performance
impact of a multi-site configuration to too great, then the choice for these clients
becomes GDPS/PPRC w/HyperSwap in a Single-site (active/standby) configuration. In
this configuration, all hardware can be duplicated across the two sites. The secondary
site processor typically will run the GDPS control system typically referred to as the k-
sys. Both a planned and unplanned site switch will involve the re-ipl of all systems in the
Sysplex at the recovered site having had automation recover and switch all dependent
resources.
GDPS/PPRC prerequisites include NetView and System Automation for z/OS. GDPS
automation also interacts with any existing automation products. With a multi-site
Parallel Sysplex, this provides a Continuous Availability/Continuous Operations and a
Disaster Recovery solution. In addition, GDPS provides set of panels for standard
actions as well as the ability to customize scripts for an installation.
GDPS/PPRC Multi-site sysplex. At least one system in Site 2 is in the site 1
production Sysplex. All production can run in site 1, the GDPS “K-sys” runs in site 2
or production can run in either or both site 1 & 2. Sysplex timers and CFs are in both
sites. Two (for availability) fiber trunks are recommended to connect both sites, For
unplanned reconfigs, system failures, processor failures, systems can be restarted in
place or on the other site depending upon how they are defined.
Page 13 9/11/2012
2 © Copyright IBM Corporation, 2012
GDPS/PPRC Single-site sysplex. All production images run at the primary site. The
GDPS “K-sys” typically runs at site 2 and all resources are typically available at both
sites. Sysplex timers and CFs are in both sites. Two (for availability) fiber trunks are
recommended to connect both sites
The following outlines the typical resources available at each site for GDPS/PPRC
w/HyperSwap.
Base Sysplex or Parallel Sysplex environment
Manages unplanned reconfigurations
z/OS, CF, disk, tape, & coordinates network connections
Designed to maintain data consistency and integrity across all volumes
Fast, automated site failover
No or limited data loss
Single point of control for
Standard actions
Stop, Remove, IPL system(s)
Parallel Sysplex Configuration management
Couple data set (CDS), Coupling Facility (CF) management
User defined script (e.g. Planned Site Switch)
PPRC Configuration management
2 Sites Beyond Metro/Sysplex Distance
GDPS solutions beyond metro/Sysplex distance include GDPS/XRC and GDPS/GM).
Clients select either the XRC or GM data replication technique based on their specific
requirements. XRC provides for the lowest possible RPO and only supports z/OS, and
zLinux data. Global Mirror provides for a tunable RPO (3-5 seconds to 18 hours) and
supports all System z and distributed systems data.
With asynchronous data replication solutions a site switch will require an automated
Sysplex wide IPL. Asynchronous data replication can support a “Planned Site Switch”
with no loss of data, but to do this the applications must be shut down. Storage based
data replication technology today supports planned site Failover/Failback scenarios
such that only changed data need be copied back to resync the sites. This capability is
available today with the various flavors of GDPS 2-site and 3-site solutions. But, in
Page 14 9/11/2012
2 © Copyright IBM Corporation, 2012
each case a Sysplex wide IPL, data replication disk/tape switch, and a client end user
network switch must be done in a coordinated manner. In this way the Sysplex is
restarted as well as all data bases and application(s) workloads at the remote site.
When the various data bases are restarted, “In Flight” and “In Doubt” transactions are
resolved as well as a “rebuild” of any and all coupling facility structures.
If a “planned outage” can be tolerated by the client, then switching sites on a regular
base can help to minimize costs involved with D/R testing. Planned site switches can
verify that all the resources required to run the application are available in both sites.
This can then be fully tested to insure that enough capacity (Processor, storage, network,
etc) is available at both sites for any and all combinations of the workload. In addition,
a client is testing the complete production application(s) end to end. Often, traditional
D/R tests only verify that the ‘system platform’ can be ipl’ed and based on time
available some minimal subset of the production workload is executed. The best D/R
test can be executed by a site switch that in fact leaves production to run in each site for
a reasonably long period of time. (ex. 3-6 months) During this time, the application
typically has gone through various periods of the business cycle including end of day,
end of week, and end of quarter processing. Through careful planning, one can
eventually verify that all application processing can be executed independent of site.
This approach fits into some business models better than others. In some countries a
physical site utility check is required once a year. This requires a full electrical
shutdown. Therefore a site switch to the other production site may be easier in this
environment as the outage is minimized to the time to perform the site switch and have
the application(s) back up and running rather than also including the time to verify all
utilities at the original production site.
The simple approach to insure that a client can easily switch sites and run all
applications with similar performance, scalability and capacity growth is to duplicate
all hardware and software resources across both sites. If a client currently has
deployed a 3-site GDPS configuration with GDPS/PPRC HyperSwap locally at the
production site, one would also want to deploy the same configuration at the target
sister production site. This would typically be called a 4-site configuration is pictured
below.
The emerging thoughts are that money currently spent on Disaster Recovery Testing
could be decreased, if one could provide on a regular basis, the ability to switch back
and forth across sites in an automated fashion. When implemented, planned site
switches provide this function. That means that D/R testing need only verify that the
unique automation required to perform a site switch for an unplanned scenario also
works. Customers minimize the differences between planned and unplanned site switch
scenarios today by deploying the “Test the Way we Recover and Recover the way we
test” model. Typically today, several clients D/R testing is done at the remote site
while maintaining full D/R protection. This is done by making a PiT FlashCopy of the
data and performing all D/R testing against that copy of the data. When a disaster
occurs, as part of the recovery process, a FlashCopy of the data is created and used for
the D/R recovery process. This minimizes unique actions between a planned and
unplanned site failover scenario.
Page 15 9/11/2012
2 © Copyright IBM Corporation, 2012
In both the planned and unplanned site switch scenarios, GDPS automation can
minimize the time of the outage or the RTO. GDPS automation can also help to
minimize the risk of performing a site switch as the automation is proven, repeatable
and minimizes human errors. The Recovery Time objective is a measure of the time it
takes from the time that a planned or unplanned site switch is identified until all
applications are up and running at the remote site. A key benefit of GDPS automation
is that, once implemented, the RTO is a known proven, repeatable quantity.
GDPS/Active/Standby - Application by Application Availability:
If all of a client’s application data is within a single data base (DB2 and/or IMS),
clients can implement high availability across two sites on an application by
application basis rather than managing high availability/disaster protection on a
platform(s) basis.
GDPS/Active/Standby automation enables automated ‘application level’ site switches
that typically provides an RTO on the order of seconds to minutes. Clients use DB2 to
DB2 software data replication with IBM Tivoli Infosphere Replication Server for z/OS and/or
IMS to IMS software data replication with IBM Tivoli Clasic Infosphere Replication for z/OS..
In this case the DB2/IMS log entries are replicated between sites by DB2/IMS. An
active z/OS image with a copy of the DB2/IMS data base is running at the remote site
and all DB2/IMS updates are applied when received. In the event of a disaster or a
planned site switch for this application, the end user network is switched to route active
transactions to the remote site for processing with minimal data loss. The routing of
transactions managed by the IBM Workload Distributor software.
This approach typically also requires the client to implement a strict change control
process across all systems to insure that the various system components are always
updated in step to keep the z/OS images in sync. The following picture outlines the
GDPS/Active/Standby solution.
Page 16 9/11/2012
2 © Copyright IBM Corporation, 2012
3-Site Configurations
Several clients with an out of region D/R implementation or with high availability
locally have moved to a 3-site configuration by implementing either GDPS/MzGM
w/HyperSwap or a GDPS/MGM w/HyperSwap. These configurations provided ‘local’
high availability/continuous operations environments providing local real time planned
site switch scenarios as well as site failover/failback functionality for a local site disaster
with an RPO of zero. Some clients, implement their ‘2nd
local site on the same data center
floor, or across a fire wall on the same data center floor. A few customers have just
implemented HyperSwap locally to avoid a disk subsystem failure from causing a
Sysplex wide outage. In all cases, the implementation focus was on increasing
availability of IT to the business locally or adding out of region D/R protection. One key
cost component on developing a multi-site solution is the duplication of the client’s end
user network. Depending on the complexity and cost associated with replicating the end
user network, several clients prefer to implement a ‘3-site’ solution across only two
physical sites.
At this time, IBM has deployed some 80+ GDPS/MzGM w/HyperSwap or GDPS/MGM
with HyperSwap multi-site configurations. The following figures outline these
implementations.
Page 17 9/11/2012
2 © Copyright IBM Corporation, 2012
57 IBM Systems© 2008 IBM Corporation
GDPS/MzGM w/HyperSwap & Incremental Resync
Site1Site1Site1Site1
Site2Site2Site2Site2
K1
Metro
Mirror
P2
bkup
CF1
KgB
Data Replication A->B & A -> C
Incremental resynch B C
if Site1 or A-disk fails
Maintains disaster recovery position
Improved RTO
Optional: CFs / Prod systems in Site2
P1 Unix
K1K1
A
K2 P1
bkup
CF2
P2
K2
Incremental Resync
Recovery SiteRecovery SiteRecovery SiteRecovery Site
A
C
A
F
SDM
1
12
2
3
4
5
6
7
8
9
10
11
Kx
CF1SDM Kx
P2
Bkup
P1
Bkup
z/OS Global Mirror
1
12
2
3
4
5
6
7
8
9
10
11
F Recommended for FlashCopy
1
12
2
3
4
5
6
7
8
9
10
11
1
12
2
3
4
5
6
7
8
9
10
11
ETR or STP
The standard GDPS/MzGM HyperSwap with Incremental Resync configuration
enables data replication from A -> B with HyperSwap and z/OS Global Mirror data
replication from A -> C. On an A->B HyperSwap event, the Incremental
Resynchronization for GDPS MzGM enables the reestablishment of the z/OS Global
Mirror session from A->C to B->C. GDPS manages the z/OS Global Mirror sessions,
so that only changed tracks need to be sent to the recovery site instead of requiring a
full-volume copy to reestablish the disaster recovery copy. This can greatly reduce the
time required (in some cases from hours down to minutes) to reconnect to your remote
site, reducing the risk of not being protected
Page 18 9/11/2012
2 © Copyright IBM Corporation, 2012
54 IBM Systems© 2008 IBM Corporation
GDPS/MGM w/HyperSwap
Site1Site1Site1Site1
Site2Site2Site2Site2 Recovery SiteRecovery SiteRecovery SiteRecovery Site
Kp
R P1
Bkup
A
D
A
C
Global Mirror
P2
bkup
P2
Bkup
A
F
CF1
1
12
2
3
4
5
6
7
8
9
10
11
1
12
2
3
4
5
6
7
8
9
10
11
1
12
2
3
4
5
6
7
8
9
10
11
KgB
GM K-Sys runs in production LPAR
► HyperSwap protection
Reduced resource requirement
CF3
Non-z
Bkup
P1 Non
-z
K1Kp
A
KP P1
bkup
CF2
P2 Non
-z
KP
R
Metro
Mirror
Kg
Kg
Kg
Kg
ETR or STP
F Recommended for FlashCopy
The standard GDPS/MGM w/HyperSwap configuration provides data replication from
A->B->C. The ability to run the GDPS/GM Ksys in a GDPS/PPRC production system,
reducing the number of z/OS images required for an MGM configuration. (Kg)
Incrementally resync A->C if Site2 or B-disk fails
Requires A->C bandwidth
GDPS/GM K-sys runs in a production system
HyperSwap protection for GDPS/GM K-sys
Reduced resource requirement
Maintain disaster recovery position following resync
Improved RPO
The Kg System lives in P2. P2 is a production system. It runs GDPS/PPRC in one
Netview. In another Netview it runs the GDPS/GM Ksys function. P2 disk is PPRCed
and protected by HyperSwap. This includes any disk that is related to the "Kg system
function".
P2 is a production system that can live in either Site 1 or Site 2. It has Kg system as
it's parasite. When you move P2, the Kg system function will be moved with it.
Page 19 9/11/2012
2 © Copyright IBM Corporation, 2012
3 site configurations provide additional options as well as considerations when
performing site switches.
1. If the two local sites are physically separated for both high availability and local D/R protection,
when a remote site switch occurs is it still requirement to have two local sites physically split at that
location as well? The alternatives would be to have two logical sites within the same physical site,
perhaps separated by a physical fire wall. In the site toggle model this consideration may be very
different than if the remote site is only used in the event of a disaster. In the disaster site scenario,
when a disaster occurs, high availability may be added to that site after the business is back up and
running again. The site toggle model views all sites as ‘production’ ready sites, where as the
disaster/recovery site model views the remote site as only being actually used in the event of a
disaster. Both models are valid, and really vary based on the client’s business requirements.
2. A complete understanding of the various fallback scenarios and additional copies of the disk
required to support each of these scenarios should be investigated and understood with both the
GDPS/MzGM and the GDPS/MGM options.
3. as mentioned above, end user Network connectivity to each data center can definitely influence
the costs associated with the ultimate solution.
A recognized customer requirement in this area is to provide the exact same
functionality at the target site (High Availability + Disaster Recovery protection) on
both a planned and when possible an unplanned site switch. That is, the ability to use
asynchronous data replication back to the original production site as well as providing
local HyperSwap functionality. With this functionality, both sites provide the business
equal functionality to the business and enables a peer site configuration.
Distributed Systems
As mentioned earlier in this paper, with the GDPS/DCM capability, GDPS automation
can inter-operate with either Tivoli AppMan or Veritas Cluster Server to provide end to
end automated management of various distributed platforms in 2-site or 3-site
configurations. Cross System data consistency can also be provided via the DS8000
open lun support. With this function, GDPS can provide a common restart point across
all z/OS and distributed systems data. Today, high availability of data is provided through
distributed systems software mirroring typically called LVM Mirrors. Data availability
for disaster recover can be provided through hardware and software based data
replication functions. Functionality in this arena will continue to evolve as clients
develop more and more cross platform applications.
Future Vision
The next chart outlines the evolution from a single server into an Enterprise Wide
Business Continuity Solution. Single Servers, became clustered servers, clustered
servers then spanned physical sites. This was then extended to end to end multi-site
heterogeneous clusters, followed by integrated end to end multi-site clusters. The
emerging trend for z/OS is next toward multiple application level Active/Active Sites at
distance coupled with the traditional platform based high availability and
disaster/recovery solutions.
Page 20 9/11/2012
2 © Copyright IBM Corporation, 2012
. Conclusion
The requirements for real time high availability, continuous operations and disaster
recovery for z/OS as well as distributed systems continue to push IBM to provide 24x7
computing environments with superior business resilience functionality.
New Smarter Planet applications typically deal with real time data that needs to be
captured, stored and analyzed in real time on a 24x7 basis. These applications and
volumes of data also introduce new requirements in scalability as well as challenges in
total cost of ownership. The management of IT Operations across a single site or multiple
sites locally or at distance, presents the opportunity to optimize all compute resources to
maximize their utilization, as well as enable them to meet the business requirements of
the end user clients today and tomorrow. Emerging trends to enable applications and their
platforms to be virtualized and run across physical data centers located around the world
is the ultimate goal. The z/OS platform, coupled with GDPS automation has become the
leading edge of general purpose solutions towards this end...
Page 21 9/11/2012
2 © Copyright IBM Corporation, 2012
Author
Bob Kern - IBM Advanced Technical Support America’s ( bobkern@us.ibm.com). Mr. Kern is an IBM
Master Inventor & Executive IT Architect. He has 36 years experience in large system design and
development and holds numerous patents in Storage related topics. For the last 28 years, Bob
has specialized in disk device support and is a recognized expert in continuous availability,
disaster recovery and real time disk mirroring. He created the DFSMS/MVS subcomponents for
Asynchronous Operations Manager and the System Data Mover. Bob was named in 2003 a
Master Inventor by the IBM Systems & Technology Group and is one of the inventors of
Concurrent Copy, PPRC, XRC, GDPS and zCDP solutions. He continues to focus in the Disk
Storage Architecture area on HW/SW solutions focused on Continuous Availability, and Data
Replication. He is a member of the GDPS core architecture team and the GDPS Customer
Design Council with focus on storage related topics.

More Related Content

What's hot

IDC: Adding Business Value with Linux Running on IBM Servers
IDC: Adding Business Value with Linux Running on IBM ServersIDC: Adding Business Value with Linux Running on IBM Servers
IDC: Adding Business Value with Linux Running on IBM ServersIBM India Smarter Computing
 
New Continuous Release and Deployment Capabilities for CICS Customers v4
New Continuous Release and Deployment Capabilities for CICS Customers v4New Continuous Release and Deployment Capabilities for CICS Customers v4
New Continuous Release and Deployment Capabilities for CICS Customers v4Susan Yoskin
 
Future of Power: IBM Trends & Directions - Erik Rex
Future of Power: IBM Trends & Directions - Erik RexFuture of Power: IBM Trends & Directions - Erik Rex
Future of Power: IBM Trends & Directions - Erik RexIBM Danmark
 
IBM Z for the Digital Enterprise 2018 - API Discovery & Debugging
IBM Z for the Digital Enterprise 2018 - API Discovery & DebuggingIBM Z for the Digital Enterprise 2018 - API Discovery & Debugging
IBM Z for the Digital Enterprise 2018 - API Discovery & DebuggingDevOps for Enterprise Systems
 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...IBM
 
Modernizing Your IMS Environment Without Rewriting Your Applications
Modernizing Your IMS Environment Without Rewriting Your ApplicationsModernizing Your IMS Environment Without Rewriting Your Applications
Modernizing Your IMS Environment Without Rewriting Your ApplicationsPrecisely
 
Product Brief Storage Virtualization isn’t About Storage
Product Brief Storage Virtualization isn’t About StorageProduct Brief Storage Virtualization isn’t About Storage
Product Brief Storage Virtualization isn’t About StorageIBM India Smarter Computing
 

What's hot (9)

Comparing Virtualization Methods for Business
Comparing Virtualization Methods for BusinessComparing Virtualization Methods for Business
Comparing Virtualization Methods for Business
 
IBM Capacity Management Analytics
IBM Capacity Management AnalyticsIBM Capacity Management Analytics
IBM Capacity Management Analytics
 
IDC: Adding Business Value with Linux Running on IBM Servers
IDC: Adding Business Value with Linux Running on IBM ServersIDC: Adding Business Value with Linux Running on IBM Servers
IDC: Adding Business Value with Linux Running on IBM Servers
 
New Continuous Release and Deployment Capabilities for CICS Customers v4
New Continuous Release and Deployment Capabilities for CICS Customers v4New Continuous Release and Deployment Capabilities for CICS Customers v4
New Continuous Release and Deployment Capabilities for CICS Customers v4
 
Future of Power: IBM Trends & Directions - Erik Rex
Future of Power: IBM Trends & Directions - Erik RexFuture of Power: IBM Trends & Directions - Erik Rex
Future of Power: IBM Trends & Directions - Erik Rex
 
IBM Z for the Digital Enterprise 2018 - API Discovery & Debugging
IBM Z for the Digital Enterprise 2018 - API Discovery & DebuggingIBM Z for the Digital Enterprise 2018 - API Discovery & Debugging
IBM Z for the Digital Enterprise 2018 - API Discovery & Debugging
 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
 
Modernizing Your IMS Environment Without Rewriting Your Applications
Modernizing Your IMS Environment Without Rewriting Your ApplicationsModernizing Your IMS Environment Without Rewriting Your Applications
Modernizing Your IMS Environment Without Rewriting Your Applications
 
Product Brief Storage Virtualization isn’t About Storage
Product Brief Storage Virtualization isn’t About StorageProduct Brief Storage Virtualization isn’t About Storage
Product Brief Storage Virtualization isn’t About Storage
 

Similar to “z/OS Multi-Site Business Continuity” September, 2012

Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013
Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013
Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013IBM Switzerland
 
CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...
CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...
CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...Hendrik van Run
 
Cloud Technology and VirtualizationProject Deli.docx
Cloud Technology and VirtualizationProject Deli.docxCloud Technology and VirtualizationProject Deli.docx
Cloud Technology and VirtualizationProject Deli.docxmonicafrancis71118
 
IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7Pradeep Natarajan
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data SharingSurekha Parekh
 
Why z/OS is a Great Platform for Developing and Hosting APIs
Why z/OS is a Great Platform for Developing and Hosting APIsWhy z/OS is a Great Platform for Developing and Hosting APIs
Why z/OS is a Great Platform for Developing and Hosting APIsTeodoro Cipresso
 
Many-to-One Replication for IBM ProtecTIER Deduplication Solutions
Many-to-One Replication for IBM ProtecTIER Deduplication SolutionsMany-to-One Replication for IBM ProtecTIER Deduplication Solutions
Many-to-One Replication for IBM ProtecTIER Deduplication SolutionsIBM India Smarter Computing
 
Nrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif PedersenNrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif PedersenNRB
 
Zero Dollar Migration Program
Zero Dollar Migration ProgramZero Dollar Migration Program
Zero Dollar Migration ProgramVMware Tanzu
 
2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usenDavid Morlitz
 
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...zOSCommserver
 
Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?Phil Estes
 
IMS10 unleash the capabilities of new technologies
IMS10   unleash the capabilities of new technologiesIMS10   unleash the capabilities of new technologies
IMS10 unleash the capabilities of new technologiesRobert Hain
 
Business Case Of Desktop Virtualization
Business Case Of Desktop Virtualization Business Case Of Desktop Virtualization
Business Case Of Desktop Virtualization Md Yousup Faruqu
 
Mdb dn 2017_14b_cloud_foundry
Mdb dn 2017_14b_cloud_foundryMdb dn 2017_14b_cloud_foundry
Mdb dn 2017_14b_cloud_foundryDaniel M. Farrell
 
Enterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktopsEnterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktopsMolten Technologies
 
How to Balance System Speed and Risk for Multi-Platform Innovation
How to Balance System Speed and Risk for Multi-Platform InnovationHow to Balance System Speed and Risk for Multi-Platform Innovation
How to Balance System Speed and Risk for Multi-Platform InnovationClaudia Ring
 
IBM Endpoint Manger for Power Management (Overview)
IBM Endpoint Manger for Power Management (Overview)IBM Endpoint Manger for Power Management (Overview)
IBM Endpoint Manger for Power Management (Overview)Kimber Spradlin
 

Similar to “z/OS Multi-Site Business Continuity” September, 2012 (20)

Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013
Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013
Integrierte Experten Systeme_Erik-Werner Radtke_IBM Symposium 2013
 
CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...
CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...
CIN-2650 - Cloud adoption! Enforcer to transform your organization around peo...
 
Cloud Technology and VirtualizationProject Deli.docx
Cloud Technology and VirtualizationProject Deli.docxCloud Technology and VirtualizationProject Deli.docx
Cloud Technology and VirtualizationProject Deli.docx
 
14 guendert pres
14 guendert pres14 guendert pres
14 guendert pres
 
IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data Sharing
 
Why z/OS is a Great Platform for Developing and Hosting APIs
Why z/OS is a Great Platform for Developing and Hosting APIsWhy z/OS is a Great Platform for Developing and Hosting APIs
Why z/OS is a Great Platform for Developing and Hosting APIs
 
Many-to-One Replication for IBM ProtecTIER Deduplication Solutions
Many-to-One Replication for IBM ProtecTIER Deduplication SolutionsMany-to-One Replication for IBM ProtecTIER Deduplication Solutions
Many-to-One Replication for IBM ProtecTIER Deduplication Solutions
 
Nrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif PedersenNrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif Pedersen
 
Zero Dollar Migration Program
Zero Dollar Migration ProgramZero Dollar Migration Program
Zero Dollar Migration Program
 
2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen
 
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
 
Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?
 
z/VM and OpenStack
z/VM and OpenStackz/VM and OpenStack
z/VM and OpenStack
 
IMS10 unleash the capabilities of new technologies
IMS10   unleash the capabilities of new technologiesIMS10   unleash the capabilities of new technologies
IMS10 unleash the capabilities of new technologies
 
Business Case Of Desktop Virtualization
Business Case Of Desktop Virtualization Business Case Of Desktop Virtualization
Business Case Of Desktop Virtualization
 
Mdb dn 2017_14b_cloud_foundry
Mdb dn 2017_14b_cloud_foundryMdb dn 2017_14b_cloud_foundry
Mdb dn 2017_14b_cloud_foundry
 
Enterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktopsEnterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktops
 
How to Balance System Speed and Risk for Multi-Platform Innovation
How to Balance System Speed and Risk for Multi-Platform InnovationHow to Balance System Speed and Risk for Multi-Platform Innovation
How to Balance System Speed and Risk for Multi-Platform Innovation
 
IBM Endpoint Manger for Power Management (Overview)
IBM Endpoint Manger for Power Management (Overview)IBM Endpoint Manger for Power Management (Overview)
IBM Endpoint Manger for Power Management (Overview)
 

More from IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceIBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM India Smarter Computing
 

More from IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

Recently uploaded

Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckHajeJanKamps
 
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc.../:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...lizamodels9
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...lizamodels9
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfmuskan1121w
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts ServiceVip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Serviceankitnayak356677
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 

Recently uploaded (20)

Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
 
KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc.../:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Best Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting PartnershipBest Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting Partnership
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdf
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts ServiceVip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 

“z/OS Multi-Site Business Continuity” September, 2012

  • 1. Page 1 9/11/2012 2 © Copyright IBM Corporation, 2012 “z/OS Multi-Site Business Continuity” September, 2012 Robert F. Kern E-mail: BOBKERN@US.IBM.COM,
  • 2. Page 2 9/11/2012 2 © Copyright IBM Corporation, 2012 Notices Copyright © 2012 by International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation. The information provided in this document is distributed “AS IS” without any warranty, either express or implied. IBM EXPRESSLY DISCLAIMS any warranties of merchantability, fitness for a particular purpose OR INFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g., IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM products discussed herein. The provision of the information contained herein is not intended to, and does not; grant any right or license under any IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 USA Trademarks The following trademarks may appear in this Paper. AIX, AS/400, DS8000, Enterprise Storage Server, Enterprise Storage Server Specialist, ESCON, FICON, FlashCopy, Geographically Dispersed Parallel Sysplex, HyperSwap, IBM, iSeries, OS/390, RMF, System/390, S/390, Tivoli, TotalStorage, z/OS, and zSeries are trademarks of International Business Machines Corporation or Tivoli Systems Inc. Other company, product, and service names may be trademarks or registered trademarks of their respective companies.
  • 3. Page 3 9/11/2012 2 © Copyright IBM Corporation, 2012 Abstract Clients look for ways to reduce their TCO, simplify operations, and provide better service to their customers. A trend in the area of Business Continuity today is that more and more clients are looking to develop multi-site Continuous Operations and D/R strategies, with the idea of switching which site production runs at on a regular basis. The concept of toggling between sites or doing site flip/flops is gaining more scrutiny. Most clients today who exploit toggling between sites do so with the full GDPS/PPRC HyperSwap functionality deployed with a Multi-Site Workload. This configuration provides the ability to perform switch sites in real time, with minimal interruption to the business. Another emerging trend is for clients with out of region data centers to start examining how they might also best accomplish the same business objective of switching sites, while minimizing the impact to their business during the site switch operation.
  • 4. Page 4 9/11/2012 2 © Copyright IBM Corporation, 2012 Introduction This paper explores the various GDPS configuration deployments that clients have implemented to provide high availability/continuous operations locally and/or out of region disaster recovery protection. It also explores the trend towards trying to reduce D/R testing costs by moving toward a ‘regular site switch’ or ‘site toggle’ model. To do this, the paper will examine each of these aspects: 2 sites within metro/sysplex distance active/active (multi-site workload) with HyperSwap and parallel sysplex exploitation - non-disruptive flip/flop active/standby (single site workload) with HyperSwap and parallel sysplex exploitation - non-disruptive flip/flop possible with appropriate configuration and temporary performance impact. Applications that do not exploit sysplex incur an outage during the site move. Disruptive site switches are typically automated to minimize the outage duration. 2 sites beyond metro/sysplex distance or using asynchronous data replication. Disruptive switch but automated to minimize outage duration Active/Standby – DB2 & IMS Application Disaster/Recovery at distance. Two separate Sysplexes at distance with application level Active/Standby across the two Sysplexes utilizing application specific software based data replication technology. 3 Site Configurations & benefits. Future vision. The traditional two site model provides for Site 1 as the “primary production” site and Site 2 as the “backup or remote recovery” site. The regular site toggle model is a peer to peer relationship model where production can run at either site and switching sites for “business reasons” on a regular basis becomes the business norm. An active/active model, that enables a site switch with minimal performance impacts, can be realized by clients through the following: sysplex enabled applications deployment of a multi-site workload under GDPS/PPRC with HyperSwap. Duplication of all site resources across the two sites. As distances between sites increase, data replication must switch from synchronous to asynchronous techniques to avoid application performance impacts. In addition, parallel sysplex distances are typically determined by the acceptable CF Link performance for the various applications as well as the maximum STP Timer distances (200km maximum). With these types of configurations a site switch is possible, but
  • 5. Page 5 9/11/2012 2 © Copyright IBM Corporation, 2012 an automated sysplex wide IPL is required. End to end automation like GDPS can minimize the outage time to perform the site switch. This paper will discuss trends and directions in this arena for z/OS. High Availability/Continuous Operations & Out of Region Disaster Protection IT Infrastructure Availability can be broken down into three pieces; High Availability, Continuous Operations and Disaster/Recovery. Each brings unique client requirements to clients when addressing Business Continuity. Through an understanding of the client business requirements in this arena, IBM can help tailor the right solution at the right cost point for any IT infrastructure. 6 © 2009 IBM Corporation Copyright IBM 2009 Business Continuity - Aspects of Availability High Availability Fault-tolerant, failure- resistant infrastructure supporting continuous application processing Continuous Operations Non-disruptive backups and system maintenance coupled with continuous availability of applications Disaster Recovery Protection against unplanned outages such as disasters through reliable, predictable recovery Protection of critical business data Recovery is predictable and reliable Operations continue after a disaster Costs are predictable and manageable GDPS Solutions Overview
  • 6. Page 6 9/11/2012 2 © Copyright IBM Corporation, 2012 GDPS (Geographically Disperses Parallel Sysplex) shipped originally in 1998 and introduced the concept of multi-site IT Infrastructure resource management, for the Sysplex. GDPS automation enhances the z/OS base sysplex and parallel sysplex management to an end to end “server, workload, and data, with a coordinated network switch” solution of resource management within the same or across multiple sites providing continuous operations for clients. To accomplish this, GDPS automation interfaces with many different System z hardware & software interfaces to reduce the necessity of skilled personnel to perform various operations during a site switch. Some of these interfaces include: System z Hardware Management Console (HMC) to manage the System z hardware reconfigurations dynamically. (ex. CBU, Expend Lpars, System IPLs, etc.) Sysplex & STD Timer interfaces, CF Duplexing Interfaces DS8000 Data Replication Functions – FlashCopy, z/OS Global Mirror(XRC), Metro Mirror (PPRC), and Global Mirror Various z/OS System Interfaces z/OS integrated with various DS8000 Synergy items. GDPS is storage vendor independent as all major storage vendors on the System z platform can participate in solutions using their implementation of the IBM DS8000 Disk Storage Subsystem data replication architecture of Metro Mirror, FlashCopy and zGM (XRC). New features and functions are developed with the IBM Systems Storage team on the DS8000. IBM sells the Host to Storage Subsystem “architecture” to the other storage vendors. Those vendors then implement the feature/function on their disk subsystems based on the Host to Disk Storage Subsystem architected interfaces. So, the disk storage subsystem internal processing for a feature or function may be different from one vendor to another. Depending on the specific feature/function there generally is some time where the feature/function is only available on the DS8000. One should consult with each storage vendor to understand specific feature/function support for any DS8000 storage subsystem enhancement. In addition, the GDPS automation inter-operates with all major system automation packages available for System z. Relative to Business Resiliency/Business Continuity, IBM’s Flagship product is GDPS. GDPS comes in a variety of different flavors/solutions. The following two charts illustrate the various solutions.
  • 7. Page 7 9/11/2012 2 © Copyright IBM Corporation, 2012 GDPS provides an entry level solution called GDPS HyperSwap Manager, focused on providing the HyperSwap availability solution for z/OS on the same data center floor or across two local area data centers up to 200km with Parallel Sysplex. GDPS/PPRC HyperSwap is the Full Function version of HyperSwap Manager, which can be easily upgrade to. The full function GDPS/PPRC HyperSwap supports zVM and zLinux data along with z/OS data. In addition to masking disk subsystem failures the full function version, exploits parallel SYSPLEX to mask CEC failures, persistent sessions to coordinate a network switch, CF Duplexing to manage CF structure failures and VTS PtP to mask tape subsystem failures. Finally, if the failures evolve into a disaster scenario, GDPS provides a complete end to end site failover/fallback capability for both planned and unplanned site switches. One mouse click and the server, data, workload and a coordinated network site switch are performed via automation. All data is recovered, the SYSPLEX IPL’ed, data bases restarted followed by the applications. Skilled personnel are no longer required to get the Sysplex up and running in the event of a disaster. GDPS/GM (System z & Open Systems data) & GDPS/XRC (z/OS & zLinux only) provide site failover/failback (FO/FB), typically “out of region” exploiting IBM’s Global Mirror and zGM (XRC) data replication technologies. GDPS/MzGM and GDPS/MGM provide a combination of high availability/continuous operations locally coupled with out of region D/R protection. All GDPS solutions are fully automated, proven, auditable, and in the case of PPRC and zGM (XRC) storage vendor independent!
  • 8. Page 8 9/11/2012 2 © Copyright IBM Corporation, 2012 The various GDPS solutions also support zVM and zLinux data through a feature call x/DR. The GDPS System z umbrella also includes the ability for GDPS automation to inter- operate with System p, x, i (Linux), Windows, HP and Sun through the GDPS/DCM (Distributed Cluster Manager) automation “inter-operability code” feature that works in conjunction with Tivoli System Automation Application Manager (SA Appman) and/or the Symantec Vertias Cluster Server Solutions. With GDPS and the x/DR and/or DCM features, a single mouse click can yield a coordinated site failover/fall back of all of the customers systems. (ex. System z (z/OS, zLinux, zVM) coordinated with say System p AIX systems). The disk replication functions can be managed separately with the GDPS and DCM automation or together, depending on the clients requirements for cross platform data consistency. GDPS is build upon the IBM DS8000 Storage based data replication architecture for FlashCopy, Metro Mirror, z/OS Global Mirror and Global Mirror. As new features and functions are implemented in the DS8000, GDPS automation is modified to exploit those features and functions. In addition, GDPS supports various DS8000 base box features used in conjunction with the various advanced functions. IBM DS8000 Metro Mirror and Global Mirror support a function known as ‘Open Lun Support’, such that through an ECKD device address, GDPS automation is able to manage the Metro Mirror and/or Global Mirror functions for a distributed system Lun(s). This is also true for Metro Global Mirror configurations. With the Open Lun support, GDPS can provide a single restart point across the platforms. More systems and data replication alternatives will continue to be provided in the future based on client requirements. This is especially important for clients that have Multi-Platform Applications where transactions are for example initially received by a Windows system, then routed to say an AIX system and then to the “backend” z/OS System. Each system may save data and as a result to recover the “application”, multiple platforms must be recovered to the same point in time. GDPS inter-operability with Tivoli AppMan and/or Symantec Veritas Cluster Server can provide such a solution for clients.
  • 9. Page 9 9/11/2012 2 © Copyright IBM Corporation, 2012 Open Lun Support is also important for clients with applications like SAP where the user interfaces are typically on non System z platforms and the backend data base runs on z/OS. In some cases clients have moved the application’s parts that were running on non- System z platforms to zLinux, but many clients resist introducing the risk of any change to critical production applications that have been running for some time. Open Lun Support can provide a data consistency solution for multi-platform application(s). All data is recovered to a single point in time enabling each platform’s data base to perform a data base Restart operation instead of a data base Recover operation when a site switch occurs. The data base restart process manages all “in flight” and “in doubt” transactions, which in turn permits the application(s) parts spread across the different platforms to resume processing from the restarted point in time forward. GDPS automation when combined with the DCM automation feature can inter-operate across the enterprise to provide a complete business solution for clients in the area of IT business continuity. This critical business function is made possible by the DS8000 ‘open Lun support’.
  • 10. Page 10 9/11/2012 2 © Copyright IBM Corporation, 2012 Two Local Data Centers - 2 sites within metro/sysplex distance The full GDPS/PPRC HyperSwap implementation can be configured as an active/active “multi-site workload’ or active/standby “single-site workload” providing real time planned and unplanned site switches mode through the deployment the following features/functions: - Parallel Sysplex – permits the movement of a workload from one processor at site 1 to an alternate CEC in site 2. - Sysplex enabled Applications. (required for multi-site workloads) - HyperSwap – permits the ability for disk access to switch from a Metro Mirror Primary volume(s) to the target volume(s) and reverse the mirror without an IPL of the parallel Sysplex. - VTS Peer to Peer Tape configuration permits real time tape mirroring across multiple physical Tape libraries without interrupting operations. - Multiple Sysplex Timers permit timer switches in real time. - CF Duplexing permits the switching of data structure access in real time. - The concept of persistent sessions enables real time network switches. Some customer applications have affinities. (e.g., all transactions for a given type must be routed to a specific system, one transaction passes information onto the next transaction, etc.). A sysplex enabled application requires that all affinities be removed so a transaction can be routed to & execute on any clone of the application on any system in the sysplex. When this is done, the application can then be run in an active/active, multi-site workload configuration. Transactions can be distributed to run on any system within the Sysplex, independent of their physical location. Through the GDPS automation, more and more clients perform both Planned and Unplanned site switches on a regular basis. Planned site switches are used to minimize the production risks associated with site or equipment maintenance. Once a lights out data center opens its doors for maintenance operations, the possibility exists for production impacts. These can be minimized by switching production to the alternate site in real time with a multi-site workload configuration. Providing the ability for a client to exploit this type of operational functionality has spurred clients to think of new approaches and new business exploitations of the technology.
  • 11. Page 11 9/11/2012 2 © Copyright IBM Corporation, 2012 38 IBM Systems© 2008 IBM Corporation GDPS/PPRC: a Continuous Availabilty and/or Disaster Recovery Solution - Metropolitan Distance SITE 1 NETWORK SITE 2 NETWORK 1 12 2 3 4 5 6 7 8 9 10 11 1 12 2 3 4 567 8 9 10 11 Manages Multi-Site Parallel Sysplex, Processors, CBU, CF, Couple Data Sets Manages Disk RC (System z & open LUN) Manages Tape Remote Copy (PtPVTS) Exploits Hyperswap & FlashCopy Function Automated planned and unplanned actions (z/OS, CF, disk, tape, site) Improves availability of heterogeneous System z business operations Planned and unplanned exception conditions The above diagram shows a high-level view of the GDPS/PPRC topology. The physical topology of a GDPS/PPRC consists of a base or Parallel Sysplex cluster spread across two sites (known as site 1 and site 2) with one or more z/OS systems at each site, separated by up to 200 kilometers (km). The multi-site sysplex must be configured with redundant hardware (e.g., a Coupling Facility and a Sysplex Timer in each site) and the cross site connections (typically dedicated or ‘dark’ fibre) must be redundant. All critical data is mirrored from the primary site (site 1 in this diagram) to the secondary site (site 2). All Shared CF structures are located on the primary site coupling facilities. Therefore, when transactions are executed on the processors at the remote site, disk I/O and Shared CF structure access is through links from the secondary site to the primary site and the disk I/O and CF structure updates are then mirrored in a synchronous manner back to the remote site. This adds additional overheads to the applications disk I/O as well as any access to shared CF structures. Before a customer elects to deploy a multi-site configuration, he must first insure that his applications are sysplex enabled after which careful consideration must be given to the system & application performance impacts of these two accesses when a transaction is executed at the remote site. In many cases the application performance impact will limit the effective distance that an active/active configuration can actually sustain. For disk I/O the performance impact of Metro Mirror rule of thumb: 1. Disk Subsystem overhead of MM at zero distance + (plus)
  • 12. Page 12 9/11/2012 2 © Copyright IBM Corporation, 2012 2. speed of light through dedicated “dark” fibre for a single protocol exchange (linear function of 1ms/100km or .1ms/10km) x (times) 3. the # of protocol exchanges implemented in the specific MM disk to disk implementation (for IBM DS8000 MM, a single protocol exchange is accomplished through a feature called pre- deposit write) + (plus) 4. other device overheads that may be on the fibre path. (ex. Switches, DWDMs, compression and/or encryption devices, channel extenders, etc.) For CF single latency rule of thumb: Signal latency impact (round trip) = 10 US/KM * fiber distance KM * # of protocol exchanges Example: assume two sites separated by 10 KM and a processor in site 1 is accessing disk in site 2, signal latency impact = 10 US/KM * 10 KM * 1 (FICON has one protocol exchange) or 100 US impact Terminology: ► Kilometer (KM) – one KM equals 5/8 mile ► Millisecond (MS) – 10**-3 ► Microsecond (US) – 10**-6 For most clients, the impact of CF single latency beyond 40-50 km (25-30 miles) yields too great of application impact. Because of this, GDPS/PPRC multi-site implementations typically tend to be campus or metro distances. If customer applications are not sysplex enabled and/or the application performance impact of a multi-site configuration to too great, then the choice for these clients becomes GDPS/PPRC w/HyperSwap in a Single-site (active/standby) configuration. In this configuration, all hardware can be duplicated across the two sites. The secondary site processor typically will run the GDPS control system typically referred to as the k- sys. Both a planned and unplanned site switch will involve the re-ipl of all systems in the Sysplex at the recovered site having had automation recover and switch all dependent resources. GDPS/PPRC prerequisites include NetView and System Automation for z/OS. GDPS automation also interacts with any existing automation products. With a multi-site Parallel Sysplex, this provides a Continuous Availability/Continuous Operations and a Disaster Recovery solution. In addition, GDPS provides set of panels for standard actions as well as the ability to customize scripts for an installation. GDPS/PPRC Multi-site sysplex. At least one system in Site 2 is in the site 1 production Sysplex. All production can run in site 1, the GDPS “K-sys” runs in site 2 or production can run in either or both site 1 & 2. Sysplex timers and CFs are in both sites. Two (for availability) fiber trunks are recommended to connect both sites, For unplanned reconfigs, system failures, processor failures, systems can be restarted in place or on the other site depending upon how they are defined.
  • 13. Page 13 9/11/2012 2 © Copyright IBM Corporation, 2012 GDPS/PPRC Single-site sysplex. All production images run at the primary site. The GDPS “K-sys” typically runs at site 2 and all resources are typically available at both sites. Sysplex timers and CFs are in both sites. Two (for availability) fiber trunks are recommended to connect both sites The following outlines the typical resources available at each site for GDPS/PPRC w/HyperSwap. Base Sysplex or Parallel Sysplex environment Manages unplanned reconfigurations z/OS, CF, disk, tape, & coordinates network connections Designed to maintain data consistency and integrity across all volumes Fast, automated site failover No or limited data loss Single point of control for Standard actions Stop, Remove, IPL system(s) Parallel Sysplex Configuration management Couple data set (CDS), Coupling Facility (CF) management User defined script (e.g. Planned Site Switch) PPRC Configuration management 2 Sites Beyond Metro/Sysplex Distance GDPS solutions beyond metro/Sysplex distance include GDPS/XRC and GDPS/GM). Clients select either the XRC or GM data replication technique based on their specific requirements. XRC provides for the lowest possible RPO and only supports z/OS, and zLinux data. Global Mirror provides for a tunable RPO (3-5 seconds to 18 hours) and supports all System z and distributed systems data. With asynchronous data replication solutions a site switch will require an automated Sysplex wide IPL. Asynchronous data replication can support a “Planned Site Switch” with no loss of data, but to do this the applications must be shut down. Storage based data replication technology today supports planned site Failover/Failback scenarios such that only changed data need be copied back to resync the sites. This capability is available today with the various flavors of GDPS 2-site and 3-site solutions. But, in
  • 14. Page 14 9/11/2012 2 © Copyright IBM Corporation, 2012 each case a Sysplex wide IPL, data replication disk/tape switch, and a client end user network switch must be done in a coordinated manner. In this way the Sysplex is restarted as well as all data bases and application(s) workloads at the remote site. When the various data bases are restarted, “In Flight” and “In Doubt” transactions are resolved as well as a “rebuild” of any and all coupling facility structures. If a “planned outage” can be tolerated by the client, then switching sites on a regular base can help to minimize costs involved with D/R testing. Planned site switches can verify that all the resources required to run the application are available in both sites. This can then be fully tested to insure that enough capacity (Processor, storage, network, etc) is available at both sites for any and all combinations of the workload. In addition, a client is testing the complete production application(s) end to end. Often, traditional D/R tests only verify that the ‘system platform’ can be ipl’ed and based on time available some minimal subset of the production workload is executed. The best D/R test can be executed by a site switch that in fact leaves production to run in each site for a reasonably long period of time. (ex. 3-6 months) During this time, the application typically has gone through various periods of the business cycle including end of day, end of week, and end of quarter processing. Through careful planning, one can eventually verify that all application processing can be executed independent of site. This approach fits into some business models better than others. In some countries a physical site utility check is required once a year. This requires a full electrical shutdown. Therefore a site switch to the other production site may be easier in this environment as the outage is minimized to the time to perform the site switch and have the application(s) back up and running rather than also including the time to verify all utilities at the original production site. The simple approach to insure that a client can easily switch sites and run all applications with similar performance, scalability and capacity growth is to duplicate all hardware and software resources across both sites. If a client currently has deployed a 3-site GDPS configuration with GDPS/PPRC HyperSwap locally at the production site, one would also want to deploy the same configuration at the target sister production site. This would typically be called a 4-site configuration is pictured below. The emerging thoughts are that money currently spent on Disaster Recovery Testing could be decreased, if one could provide on a regular basis, the ability to switch back and forth across sites in an automated fashion. When implemented, planned site switches provide this function. That means that D/R testing need only verify that the unique automation required to perform a site switch for an unplanned scenario also works. Customers minimize the differences between planned and unplanned site switch scenarios today by deploying the “Test the Way we Recover and Recover the way we test” model. Typically today, several clients D/R testing is done at the remote site while maintaining full D/R protection. This is done by making a PiT FlashCopy of the data and performing all D/R testing against that copy of the data. When a disaster occurs, as part of the recovery process, a FlashCopy of the data is created and used for the D/R recovery process. This minimizes unique actions between a planned and unplanned site failover scenario.
  • 15. Page 15 9/11/2012 2 © Copyright IBM Corporation, 2012 In both the planned and unplanned site switch scenarios, GDPS automation can minimize the time of the outage or the RTO. GDPS automation can also help to minimize the risk of performing a site switch as the automation is proven, repeatable and minimizes human errors. The Recovery Time objective is a measure of the time it takes from the time that a planned or unplanned site switch is identified until all applications are up and running at the remote site. A key benefit of GDPS automation is that, once implemented, the RTO is a known proven, repeatable quantity. GDPS/Active/Standby - Application by Application Availability: If all of a client’s application data is within a single data base (DB2 and/or IMS), clients can implement high availability across two sites on an application by application basis rather than managing high availability/disaster protection on a platform(s) basis. GDPS/Active/Standby automation enables automated ‘application level’ site switches that typically provides an RTO on the order of seconds to minutes. Clients use DB2 to DB2 software data replication with IBM Tivoli Infosphere Replication Server for z/OS and/or IMS to IMS software data replication with IBM Tivoli Clasic Infosphere Replication for z/OS.. In this case the DB2/IMS log entries are replicated between sites by DB2/IMS. An active z/OS image with a copy of the DB2/IMS data base is running at the remote site and all DB2/IMS updates are applied when received. In the event of a disaster or a planned site switch for this application, the end user network is switched to route active transactions to the remote site for processing with minimal data loss. The routing of transactions managed by the IBM Workload Distributor software. This approach typically also requires the client to implement a strict change control process across all systems to insure that the various system components are always updated in step to keep the z/OS images in sync. The following picture outlines the GDPS/Active/Standby solution.
  • 16. Page 16 9/11/2012 2 © Copyright IBM Corporation, 2012 3-Site Configurations Several clients with an out of region D/R implementation or with high availability locally have moved to a 3-site configuration by implementing either GDPS/MzGM w/HyperSwap or a GDPS/MGM w/HyperSwap. These configurations provided ‘local’ high availability/continuous operations environments providing local real time planned site switch scenarios as well as site failover/failback functionality for a local site disaster with an RPO of zero. Some clients, implement their ‘2nd local site on the same data center floor, or across a fire wall on the same data center floor. A few customers have just implemented HyperSwap locally to avoid a disk subsystem failure from causing a Sysplex wide outage. In all cases, the implementation focus was on increasing availability of IT to the business locally or adding out of region D/R protection. One key cost component on developing a multi-site solution is the duplication of the client’s end user network. Depending on the complexity and cost associated with replicating the end user network, several clients prefer to implement a ‘3-site’ solution across only two physical sites. At this time, IBM has deployed some 80+ GDPS/MzGM w/HyperSwap or GDPS/MGM with HyperSwap multi-site configurations. The following figures outline these implementations.
  • 17. Page 17 9/11/2012 2 © Copyright IBM Corporation, 2012 57 IBM Systems© 2008 IBM Corporation GDPS/MzGM w/HyperSwap & Incremental Resync Site1Site1Site1Site1 Site2Site2Site2Site2 K1 Metro Mirror P2 bkup CF1 KgB Data Replication A->B & A -> C Incremental resynch B C if Site1 or A-disk fails Maintains disaster recovery position Improved RTO Optional: CFs / Prod systems in Site2 P1 Unix K1K1 A K2 P1 bkup CF2 P2 K2 Incremental Resync Recovery SiteRecovery SiteRecovery SiteRecovery Site A C A F SDM 1 12 2 3 4 5 6 7 8 9 10 11 Kx CF1SDM Kx P2 Bkup P1 Bkup z/OS Global Mirror 1 12 2 3 4 5 6 7 8 9 10 11 F Recommended for FlashCopy 1 12 2 3 4 5 6 7 8 9 10 11 1 12 2 3 4 5 6 7 8 9 10 11 ETR or STP The standard GDPS/MzGM HyperSwap with Incremental Resync configuration enables data replication from A -> B with HyperSwap and z/OS Global Mirror data replication from A -> C. On an A->B HyperSwap event, the Incremental Resynchronization for GDPS MzGM enables the reestablishment of the z/OS Global Mirror session from A->C to B->C. GDPS manages the z/OS Global Mirror sessions, so that only changed tracks need to be sent to the recovery site instead of requiring a full-volume copy to reestablish the disaster recovery copy. This can greatly reduce the time required (in some cases from hours down to minutes) to reconnect to your remote site, reducing the risk of not being protected
  • 18. Page 18 9/11/2012 2 © Copyright IBM Corporation, 2012 54 IBM Systems© 2008 IBM Corporation GDPS/MGM w/HyperSwap Site1Site1Site1Site1 Site2Site2Site2Site2 Recovery SiteRecovery SiteRecovery SiteRecovery Site Kp R P1 Bkup A D A C Global Mirror P2 bkup P2 Bkup A F CF1 1 12 2 3 4 5 6 7 8 9 10 11 1 12 2 3 4 5 6 7 8 9 10 11 1 12 2 3 4 5 6 7 8 9 10 11 KgB GM K-Sys runs in production LPAR ► HyperSwap protection Reduced resource requirement CF3 Non-z Bkup P1 Non -z K1Kp A KP P1 bkup CF2 P2 Non -z KP R Metro Mirror Kg Kg Kg Kg ETR or STP F Recommended for FlashCopy The standard GDPS/MGM w/HyperSwap configuration provides data replication from A->B->C. The ability to run the GDPS/GM Ksys in a GDPS/PPRC production system, reducing the number of z/OS images required for an MGM configuration. (Kg) Incrementally resync A->C if Site2 or B-disk fails Requires A->C bandwidth GDPS/GM K-sys runs in a production system HyperSwap protection for GDPS/GM K-sys Reduced resource requirement Maintain disaster recovery position following resync Improved RPO The Kg System lives in P2. P2 is a production system. It runs GDPS/PPRC in one Netview. In another Netview it runs the GDPS/GM Ksys function. P2 disk is PPRCed and protected by HyperSwap. This includes any disk that is related to the "Kg system function". P2 is a production system that can live in either Site 1 or Site 2. It has Kg system as it's parasite. When you move P2, the Kg system function will be moved with it.
  • 19. Page 19 9/11/2012 2 © Copyright IBM Corporation, 2012 3 site configurations provide additional options as well as considerations when performing site switches. 1. If the two local sites are physically separated for both high availability and local D/R protection, when a remote site switch occurs is it still requirement to have two local sites physically split at that location as well? The alternatives would be to have two logical sites within the same physical site, perhaps separated by a physical fire wall. In the site toggle model this consideration may be very different than if the remote site is only used in the event of a disaster. In the disaster site scenario, when a disaster occurs, high availability may be added to that site after the business is back up and running again. The site toggle model views all sites as ‘production’ ready sites, where as the disaster/recovery site model views the remote site as only being actually used in the event of a disaster. Both models are valid, and really vary based on the client’s business requirements. 2. A complete understanding of the various fallback scenarios and additional copies of the disk required to support each of these scenarios should be investigated and understood with both the GDPS/MzGM and the GDPS/MGM options. 3. as mentioned above, end user Network connectivity to each data center can definitely influence the costs associated with the ultimate solution. A recognized customer requirement in this area is to provide the exact same functionality at the target site (High Availability + Disaster Recovery protection) on both a planned and when possible an unplanned site switch. That is, the ability to use asynchronous data replication back to the original production site as well as providing local HyperSwap functionality. With this functionality, both sites provide the business equal functionality to the business and enables a peer site configuration. Distributed Systems As mentioned earlier in this paper, with the GDPS/DCM capability, GDPS automation can inter-operate with either Tivoli AppMan or Veritas Cluster Server to provide end to end automated management of various distributed platforms in 2-site or 3-site configurations. Cross System data consistency can also be provided via the DS8000 open lun support. With this function, GDPS can provide a common restart point across all z/OS and distributed systems data. Today, high availability of data is provided through distributed systems software mirroring typically called LVM Mirrors. Data availability for disaster recover can be provided through hardware and software based data replication functions. Functionality in this arena will continue to evolve as clients develop more and more cross platform applications. Future Vision The next chart outlines the evolution from a single server into an Enterprise Wide Business Continuity Solution. Single Servers, became clustered servers, clustered servers then spanned physical sites. This was then extended to end to end multi-site heterogeneous clusters, followed by integrated end to end multi-site clusters. The emerging trend for z/OS is next toward multiple application level Active/Active Sites at distance coupled with the traditional platform based high availability and disaster/recovery solutions.
  • 20. Page 20 9/11/2012 2 © Copyright IBM Corporation, 2012 . Conclusion The requirements for real time high availability, continuous operations and disaster recovery for z/OS as well as distributed systems continue to push IBM to provide 24x7 computing environments with superior business resilience functionality. New Smarter Planet applications typically deal with real time data that needs to be captured, stored and analyzed in real time on a 24x7 basis. These applications and volumes of data also introduce new requirements in scalability as well as challenges in total cost of ownership. The management of IT Operations across a single site or multiple sites locally or at distance, presents the opportunity to optimize all compute resources to maximize their utilization, as well as enable them to meet the business requirements of the end user clients today and tomorrow. Emerging trends to enable applications and their platforms to be virtualized and run across physical data centers located around the world is the ultimate goal. The z/OS platform, coupled with GDPS automation has become the leading edge of general purpose solutions towards this end...
  • 21. Page 21 9/11/2012 2 © Copyright IBM Corporation, 2012 Author Bob Kern - IBM Advanced Technical Support America’s ( bobkern@us.ibm.com). Mr. Kern is an IBM Master Inventor & Executive IT Architect. He has 36 years experience in large system design and development and holds numerous patents in Storage related topics. For the last 28 years, Bob has specialized in disk device support and is a recognized expert in continuous availability, disaster recovery and real time disk mirroring. He created the DFSMS/MVS subcomponents for Asynchronous Operations Manager and the System Data Mover. Bob was named in 2003 a Master Inventor by the IBM Systems & Technology Group and is one of the inventors of Concurrent Copy, PPRC, XRC, GDPS and zCDP solutions. He continues to focus in the Disk Storage Architecture area on HW/SW solutions focused on Continuous Availability, and Data Replication. He is a member of the GDPS core architecture team and the GDPS Customer Design Council with focus on storage related topics.