This document discusses mainframe clustering techniques including logical partitioning (LPAR), basic shared storage, channel-to-channel connections (CTC), and parallel sysplex. A parallel sysplex uses a coupling facility to connect multiple mainframes, providing data sharing and workload distribution. Geographically dispersed parallel sysplex (GDPS) extends this across data centers, with technologies like PPRC for synchronous data mirroring. GDPS includes a controlling system to automate failure recovery and switching between primary and secondary sites. Mainframe clustering maximizes availability, scalability and disaster recovery capabilities.
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
GDPS and System Complex
1. G D P S A N D S Y S T E M C O M P L E X
M A I N F R A M E C L U S T E R I N G
Najmi Mansoor Ahmed
Principal Architect PSS (IBM ALCS v241 under z/OS 2.1)
Presented on 23-Nov-2016
2. BASICS
• A system is made up of hardware products including a central processor (CPU), and
software products, with the primary one being an operating system such as z/OS.
• The CPU and other system hardware, such as channels and storage (RAM), make up a
Central Processor Complex (CPC) or in general terms – mainframe box.
Mainframe
Disks
3. • It is possible to run mainframe with a single processor /uniprocessor (CP) , but this is not
a typical system.
• When all the CPs share central storage and a single OS image manages the processing,
work is assigned to a CP that is available to do the work. If a CP fails, work can be routed
to another CP.
Mainframe
Disks
z/OS
CP1
Multi and Uniprocessor
The ability to partition a large system into multiple smaller systems, called logical
partitions or LPARs, is now a core requirement in practically all mainframe installations.
4. It allows to build virtual clusters of CPUs, OSs and applications with in a single box
Multi and Uniprocessor
5. • Mainframe has three major clustering techniques:
LPAR
Channels
z/OS
Disk control unit
Clustering
LPAR
Channels
z/OS
Disk control unit
Basic Shared Storage (DASD)
It is typically used when operations
staff controls which jobs go to which
system and ensures that there is no
conflict (both system trying to
update same data at the same time)
A channel is a high-speed data
bus.
Todays mainframe use FICON
(FIber CONnection) channels.
• Basic Shared DASD
• CTC rings
• Parallel Sysplex
6. • Channel-to-Channel CTC ring
LPAR
Channels
z/OS
Disk control unit
Clustering
LPAR
Channels
z/OS
Disk control unit
Channel-to-Channel (CTC)
It simulates an I/O device that can
be used by one system to
communicate with another and
provides data path and
synchronization for data transfer
When CTC is used to connect
two channels a loosely coupled
multiprocessing system is
established
CTC ring
7. • A loosely coupled configuration has more than one mainframes managed by more than
one z/OS image.
• Although a loosely coupled configuration increases system capacity, it is not as easy to
manage as either a uniprocessor or a tightly coupled multiprocessor.
• Each system must be managed separately, often by a human operator, who monitors
product-specific messages on a set of consoles for each system.
• Products and applications that need to communicate and are running on separate
systems have to create their own communication mechanism.
Loosely coupled multiprocessors
CP4CP3CP2CP1 z/OS Image CP4CP3CP2CP1 z/OS Image
8. • To help solve the difficulties of managing many z/OS systems, IBM introduced the z/OS systems complex or
sysplex
• A sysplex is a collection of z/OS systems that cooperate, using certain hardware and software products, to
process work.
Sysplex
Mainframe 4
Disks
Mainframe 3
Disks
Mainframe 2
Disks
Mainframe 1
9. • SYSPLEX - System Complex - Is clustering more systems for availability, sharing workload, recovery,
resource and data sharing.
• System complex can be build in same data centre or in two different data centres from 2 to 32
mainframes
Sysplex
Mainframe 4Mainframe 3Mainframe 2Mainframe 1
10. SYSPLEX
It is a Clustering technique
• Every server (node) / LPAR has access to data resources
• Every cloned sysplex enabled application can run on every
LPAR
• It appears as single large system having single operating
interface to control it
12. Base Sysplex
• Joining systems through Channel to Channel
(CTC) connections
Types of Sysplex
LPAR
Channels
z/OS
Disk control unit
LPAR
Channels
z/OS
Disk control unit
CTC ring
13. PARALLEL SYSPLEX
A Parallel Sysplex is a cluster of IBM mainframes acting together
as a single system image with z/OS.
Used for disaster recovery, Parallel Sysplex combines data
sharing and parallel computing to allow a cluster of up to 32
systems to share a workload for high performance and high
availability.
14. PARALLEL SYSPLEX
Parallel Sysplex = Base Sysplex + Coupling Facility (CF)
Mainframe 1 Mainframe 2
Site A Site B
System Complex - SysPlex
15. Parallel Sysplex
• An enhancement to Base Sysplex by joining systems through a
Facility (CF)
Types of Sysplex
LPAR
Channels
z/OS
Disk control unit
LPAR
Channels
z/OS
Disk control unit
CTC ring
Coupling
Facility
CF channels
16. Types of Sysplex
LPAR
Channels
z/OS
Disk control unit
LPAR
Channels
z/OS
Disk control unit
CTC ring
Coupling
Facility
CF channels;
Parallel Sysplex technology is an enabling technology with two critical capabilities
Parallel processing
Enabling read/write data sharing across multiple systems with full data integrity. "shared data" (as
opposed to "shared nothing")
DisksDisks
17. A key component in any Parallel Sysplex is the Coupling Facility (CF)
infrastructure.
Coupling facility = sharing of central memory between two systems.
Parallel Sysplex
Parallel Sysplex is analogous in concept to a UNIX cluster – allow the
customer to operate multiple copies of the operating system as a
single system. This allows systems to be added or removed as
needed, while applications continue to run.
PrimarySecondary
System Complex (Clustering)
System Complex (Clustering)
Standby
Production
Production
Non Production
Standby
Non Production
18. • A coupling facility is a special logical partition that runs the coupling facility
control code (CFCC) and provides high-speed caching, list processing, and
locking functions in a sysplex.
• A CF functions largely as a fast scratch pad. It is used for three purposes:
Coupling Facility (CF) structure
Locking information that is shared among all attached systems
Cache information (such as for a database) that is shared among all attached systems
Data list information that is shared among all attached systems
19. Characteristics of Parallel Sysplex
A common time source to synchronize all Mainframes systems' clocks.
Coupling facility (CF) sharing of central memory between two systems for high performance data
sharing
Cross System Coupling Facility (XCF) allows systems to communicate peer-to- peer
Global Resource Serialization (GRS) allows multiple systems to access the same resources
concurrently, serializing where necessary to ensure exclusive access to prevents updates to the
same data
Couple Data Sets (CDS) requires by sysplex to store information about its systems
FICON
Sysplex Timer
Coupling Facility
XCF
XCF
GRS
GRS
z/OS
LPAR
z/OS
LPAR
Couple Data Sets
Couple Data Sets
CDS CDS
20. Characteristics of Parallel Sysplex
• The best practice for any data sharing Parallel Sysplex is that there is at
least one failure-isolated CF implemented.
• It is critical that all Parallel Sysplex have at least two CFs to every member
of sysplex
FICON
Sysplex Timer
Coupling Facility
XCF
XCF
GRS
GRS
z/OS
LPAR
z/OS
LPAR
CF CF
Couple Data Sets
Couple Data Sets
CDS CDS
21. Characteristics of Parallel Sysplex
Timer is a mandatory hardware requirement for a parallel sysplex consisting of more than one z
Series servers.
It provides synchronization for the time-of-day (TOD) clocks of multiple servers thereby allows
events started by different servers to be properly sequenced in time.
When multiple server updates same data base , all updates are required to be time stamped in
proper sequence.
The Server Time Protocol feature is designed to provide the capability for multiple servers and
Coupling Facilities to maintain time synchronization with each other
Redundancy of timers allows to stay Sysplex if either of has planned /unplanned outage
FICON
Couple Data Sets
Couple Data Sets
STP Timer
XCF
XCF
GRS
GRS
z/OS
LPAR
z/OS
LPAR
CF CF
Server Time Protocol
(STP) is a time
synchronization
architecture designed to
provide the capability for
multiple servers (CPCs) to
maintain time
synchronization with each
other and to form a
Coordinated Timing
Network (CTN)
To maintain time accuracy,
the STP facility supports
connectivity to an External
Time Source (ETS)
It is IBM’s License Internal
Code (LIC)
CDS CDS
22. Parallel Sysplex
Continuous Availability
With Parallel Sysplex cluster it is possible to construct a parallel processing environment
single points of failure
Because of the redundancy in the configuration, there is a significant reduction in the
number of single points of failure.
Ability to perform hardware and software maintenance and installations in a non-
disruptive manner. Through data sharing and dynamic workload management, servers
can be dynamically removed from or added to the cluster allowing installation and
maintenance activities to be performed while the remaining systems continue to
work.
Capacity
Parallel Sysplex environment can scale near linearly from 2 to 32 systems
Dynamic Workload Balancing
The entire Parallel Sysplex cluster can be viewed as a single logical resource to end
and business applications.
- Benefits
23. GDPS
Geographically Dispersed Parallel Sysplex (GDPS) is an extension
of Parallel Sysplex of mainframes located, potentially, in different
cities or/and data centres.
GDPS is an end to end application availability solution
24. Globally Dispersed Parallel Sysplex - GDPS
• It is the ultimate disaster recovery and continuous availability
solution for a multi-site enterprise
• GDPS is combination of storage and Parallel Sysplex technology.
• Automates Parallel Sysplex operation tasks and perform failure
recovery from a single point of control.
• Types of GDPS configurations
• GDPS/PPRC based on synchronous data mirroring technology (PPRC)
that can be used on mainframes 200 kilometres (120 mi) apart.
• GDPS/XRC is an asynchronous Extended Remote Copy (XRC)
technology with no restrictions on distance
• GDPS/Global Mirror is based on asynchronous IBM Global
Mirror technology with no restrictions on distance.
• GDPS/active-active is a disaster recovery / continuous availability
solution, based on two or more sites, separated by unlimited
distances, running the same applications and having the same data to
provide cross-site workload balancing.
25. GDPS ACTIVE/ACTIVE
To achieve GDPS Active/Active configuration :
• All critical data must be PPRCed and Hyper-swaped enabled
• All critical CF structure must be duplexed
• Application must be parallel sysplex enabled
26. GDPS/PPRC
GDPS/PPRC, is metro area Continuous Availability (CA) and Disaster
Recovery (DR) solution, based upon
Multi-site Parallel (Sys)tem Com(plex)
Synchronous disk replication
It supports two configurations
Active/Standby or single site workload
Active/active or multi-site workload
27. Disks
Site 1 Active
Disks
GDPS / PPRC
Site 2 Warm
PPRC / Metro Mirror
• Even with the multi-path and RAID architecture within DASD subsystems the
single copy of the data continues to be a single point of failure (SPOF).
• A failure of a disk subsystem or even a single disk array failure can take down
major applications, the system, or even the sysplex.
• GDPS/PPRC is IBM disk replication technology to supplement removing SPOF
28. A Parallel Sysplex environment has been designed to reduce
outages by replicating hardware, operating systems, and
application components. In spite of this redundancy, having only
one copy on the data is an exposure.
GDPS Hyperswap
If there is a problem writing or accessing the primary disk, then
there is a need to swap I/O from the primary disks to the
secondary disks.
HyperSwap, a feature of GDPS, enhances the resilience by facilitating the
immediate switching I/O operations from the primary to the secondary
disks therefore providing near-continuous access to data
29. Secondary
Disks
Primary
Disks PPRC / Metro Mirror
GDPS Hyperswap
GDPS
Controlling
System
GDPS
Controlling
System
Primary Site Secondary Site
Hypersawap provides continuous availability of
data by masking disk outages and automates
switching between the two copies of the data
without causing an application outage In real time.
Failure
HyperSwap
K2K1 CDS
30. • In order for GDPS to operate, there must be a separate, isolated, z/OS system
known as the Controlling system.
• GDPS environments without a Controlling system are not supported.
• IBM strongly recommends 2 Controlling systems are setup per Sysplex.
• The idea is for one to act as a Primary and the other to be a Backup.
GDPS controlling System
PPRC / Metro Mirror
GDPS
Controlling
System (K1)
GDPS
Controlling
System(K2)
K2
K1
Primary Site Secondary Site
31. PPRC / Metro Mirror
GDPS controlling system :
GDPS controlling system ?
GDPS
Controlling
System (K1)
GDPS
Controlling
System (K2)
1. Performs situation analysis (after the unplanned event) to determine the status of
the production system and/or disks.
2. Drives automated recovery actions.
The controlling system must be in same sysplex so it can see all the messages
from systems in sysplex and communicate with them.
K2K1
Primary Site Secondary Site
CDSCDS
z/OS
LPAR
(K1)
XCF
GRS
CF
z/OS
LPAR
XCF
GRS
CF
(K2)
32. PPRC / Metro Mirror
K2K1
Primary Site Secondary Site
CDSCDS
ALCS
(K1)
XCF
GRS
CF
ALCS
XCF
GRS
CF
(K2)
The availability of controlling system is fundamental to GDPS
Why does a GDPS configuration need a controlling system ?
GDPS controlling system is designed to survive a failure in the opposite site of
where primary disks are .
Primary disks are normally in Site1 and controlling system in Site2 is designed to
survive if Site1 or the disks in Site1 fail.
33. Final view - Combining the jigsaw puzzle
PPRC / Hyperswap
K2
K1
Primary Site
CDSCDS
ALCS
(K1)
XCF
GRS
CF
GDPS managed
40 KM
Secondary Site
ALCS
XCF
GRS
CF
(K2)
CF links (Timer links)
ADVA /DWDM ADVA /DWDM
PPRC links , ISL Channels
GDPS
Site 1 TIBCO Site 2- Tibco
Applications
ESW Network
34. Conclusion
Mainframe physical clustering (System complex /Sysplex) between
dispersed data centres (GDPS) provides enterprise level disaster
recovery, data sharing , parallel computing capability to share workload
workload for high performance and high availability.
35. G D P S A N D S Y S T E M C O M P L E X
N A J M I M A N S O O R A H M E D