Toronix - IBM WebSphere HA & High Availability Concepts

IBM SOA
© 2007 IBM Corporation
IBM – BCBS High Availability
Fast Track
Robert R. Rowntree
SOA Enterprise Architect
IBM Software Group

IBM SOA
IBM confidential2 IBM – BCBS High Availability Fast Track
 Introduction
- Availability, 9’s uptime, Work Patterns, Reference Architecture
 HA WebSphere Systems
- Products leverage WAS, SPOF’s, WAS Internals
- Clustering, HA Topologies, HAManager
 Managing HA WebSphere Systems
- ITCAM, SOA Security, SOA Management
- SLAs
 HA Failure Scenarios
- Http Servers, JVMs,
- Containers - Portlet, Web, EJB
- LDAP, Databases, WMQ, ME’s
Agenda

IBM SOA
HA – What is Availability? Biggest Impact –App’s/People
Availability = MTBF/(MTBF + MTTR)
 Downtime – Planned/Unplanned – Engineer for Unplanned
- Engineer for unplanned downtime during business hours
- BIGGEST source of unplanned downtime – Software (Applications) and People Errors
 Human Errors – Expertise level, training and tooling. Some companies don’t have any of the 3 – SOL
from the start
 Software Errors – Network, Server, middleware and applications – the biggest culprit - APPLICATIONS
 7X24X365, 6X20 or 5X12 – When will it go down? Don’t know.
- Most applications still typically need minimal disruption during business hours.
- It’s the amount of planned downtime that is different.
- Can’t usually predicate when unplanned downtime will occur – exceptions? – Loads on pays
at banks.
- On a daily basis most businesses can tolerate similar downtimes during biz hours.
 Focus usually is on MTTR
- Strategy is to engineer in the use of fault detection and Auto failover where possible.
- Aircraft have multiple engine systems, Nuclear Plant electronics has triplicated logic.

IBM SOA
Levels of Availability

IBM SOA
Availability Matrix

IBM SOA
Latency for various Workload Patterns

IBM SOA
Reference Architecture

IBM SOA
Reference Architecture: Product View

IBM SOA
 Introduction
- SLAs
Agenda

IBM SOA
WebSphere Systems – Team of Products Leverage WAS
 Several products leverage WebSphere Application Servers HA
Capabilities
1. WebSphere Portal Server
2. WebSphere Process Server
3. WebSphere ESB
4. WebSphere Partner Gateway

IBM SOA
WebSphere System – Single Points of Failure
Failure Points Possible Solutions
Firewalls Firewall clustering, firewall
Sprayers
Caching Proxy Backing CP
Http Sprayer Backup load balancer
Web Server Multiple WS
WAS master repository, log files HA share file system, NFS, HW
based clustering
WAS Horizontal, Vertical or both
Node Agent Multiple NA in the cluster, NA as a
OS Service
Deployment Manager OS Service, not a SPOF

IBM SOA
Entity EJB, application DB HA DB’s
Default Message Provider HAManager configured
Default Message Provider data
store
Clustering, data replication and
parallel databases.
Application database Clustering, data replication,
parallel databases
Session database Memory to Memory replication,
DB clustering
Transaction Logs Shared file system, HAManager
provides failover.
WMQ WMQ cluster
LDAP Master replica, HA LDAP

IBM SOA
Hubs Multiple interconnected network
paths
OS and other software crashes Clustering, switching to a healthy
node.
Software and Hardware Upgrades Rolling upgrades with clustering or
WLM for 7X24X365, planned
maintenance.

IBM SOA
Deployment Manager Failure – Not a SPOF
 Not a Single Point of Failure
1. WAS V6 does not keep routing tables for clustered resource
such as applications or message engines.
1. WAS V6 it is an elected member of cluster. Routing table is now fault tolerant.
2. Responsible less run-time critical tasks such as:
1. Configuration changes
2. Performance Monitoring
3. JMX routing through DM to other components in the cell.

IBM SOA
Data – Key point of Failure
 WAS or other components can run without Data
 Can design in redundancy at most tiers but if data is not
available WAS systems can’t run.
 Key Data components required by WAS
- Application Data
- Administrative Repositories
- Persistent Session datastore
- Message Engine datastores
- Transaction Log
- WebSphere System + Application binaries
- HTML Images and files

IBM SOA
Address of where to deliver returning request has 4 components.
1. Cache ID
2. Session ID – ID of session once back in the JVM.
3. Clone ID – Used by plug-in to HTTP server to determine both
application server. On failover the failover clone ID is appended and if
fallback is set and once the original JVM is back up, the session will
FAILBACK. This is done to rebalance loads which is important for a
small number of nodes in a cluster.
1. Multiple app servers providing vertical scaling.
2. Need both IP and port – the http plug on the HTTP servers decode this from
the cookie.
Typically a load balancer can determine the destination IP, but it
can’t determine both the IP and port address from the WAS
generated session id. The Http plug translates the SESSION ID
using a XML file generated by the app server to determine the IP
and port end point that precisely ID’s the originating app server.
Session Management –Tracking Down a Session

IBM SOA
WebSphere Application Server – Internal Architecture

IBM SOA
Clustering – Vertical and Horizontal

IBM SOA
Scope of Isolation with System z
Clustering
Possible
Nodes LPAR CEC Geo
Dispersion
Isolation Cost
Vertical Only 1 Same Same No Minimal Lowest
Both 1 or More Same Same No
Both 1 or More 1 or More Same No
Both 1 or More 1 or more 1 or More No
Both 1 or More 1 or more 1 or more Yes Highest Very High

IBM SOA
Topology HA Level 1 – Single Node
Best Use – Low Cost, Application with low availability needs, Test Environment
SPOF’s – HTTP Server, Admin Servers, Database
Advantage – Lowest effort to maintain, out of the box install
Disadvantage – Almost everything is a SPOF

IBM SOA
Topology - HA Level 2 – Vertical Scaling
Best Use – Low Cost, Some Degree of Fail Over required
SPOF’s – HTTP Server, Database, FW, LDAP
Advantage – F-O if one App Server (JVM) crashes or out of threads temporarily
Disadvantage – OOS/SOL if node level problem, lower level SW or HW has
problems.

IBM SOA
Topology - HA Level 3 – Vertical and Horizontal Clustering
Best Use – 1st level providing continuous operation at WAS level
SPOF’s – DB, FW, LDAP
Advantages – Nodal Isolation, On-line Maintenance, Mixed versions possible
Disadvantage – More effort in maintaining the system, HAManager needs NAS, LL

IBM SOA
Topology - HA Level 4 – Database Clustering Failover
Best Use – 1st level providing continuous operation at WAS level
SPOF’s – Admin Servers (DMGR and Node Agent), LDAP
Advantages – Option use ARM for auto restart DMGR, NA
Disadvantage – Downed AS, No TPV, No Reconfiguration

IBM SOA
Topology - HA Level 5

IBM SOA
Failover Clustering Capacity – 2 Types
IP Based Cluster Failover – Slow 1 – 5 minutes
- Tivoli Systems Automation
- IHACMP – AIX
Non IP Cluster – 1 sec to Minutes depending on Configuration
WAS WLM – Http plugin to Http Server, EJB (Corba distributed communication)
Clustering Database Failover
Slow IP based failover
Fast – Parallel Database Partitioning for DB2 UDB EE, Oracle Real App Clusters(RAC), OPS

IBM SOA
HAManager
Benefit – Enhance Availability – 2 Area’s
- Transaction Services – Transaction Log Recovery
- Messaging Services
Why – Crashed/Zombie JVM may leave in-flight
transaction with locked resources.
- Block peers from locked records – Snowball effect
- Transaction are not completed
- Frequency – low – but cost can be very high coz s#$% happens just when you
don’t want it to.
Options
- Restart server (Booters) – This is a slow process - WAS V5
- Give Access to another application server – WAS V6 HA Manager or IP based
cluster failover

IBM SOA
Key Scenario for Transaction Services
1. JVM crashes with transactions in progress – Tx in
doubt.
2. 2PC may have several resource manager (WMQ, DB2,
SQL Server) involved with objects locked.
3. Without failover to another JVM’s transaction services,
resources will be locked until time-outs are reached.
4. Worse problem – other tx may fail because they cannot
obtain locks.
5. Cascading/snowball effect.

IBM SOA
HAManager – Core Group

IBM SOA
HAManager – Core Group
-Core Group has coordinator elected
-Tracks info and state names, members, policies,
active/inactive

IBM SOA
HAManager – Group Coordination

IBM SOA
HAManager – Transaction Managers for Core Group

IBM SOA
HAManager – HOW – WAS V6
Recovery process started in peer member of cluster.
1. Waits for lock time-out to expire set by Crashed JVM.
2. Complete’s in-doubt Tx’s.
3. Releases locks in backend resource managers.
4. Releases Tx logs.
5. No new work is performed.

IBM SOA
HA Manager – Scenario – 2 PC with Resource Manager
Locking (Database)

IBM SOA
HA Manager – Scenario – 2 PC with Resource
Manager Locking (Database)

IBM SOA
HAManager – Tx Services (Locking) Policies
-One of N Policy Requirements – Most typical Policy
-Shared file system must support automatic lock recovery.
-Locking critical to prevent corruption of Tx logs.
-Lock recovery is necessary to ensure peer cluster member
access.
-Lock lease time (LLT) default – 45 seconds
- HAManager fails over in 10 second but LLT=45
- HAManager must wait 35 seconds
- Starting point LLT=10
- HAManager = 12 seconds

IBM SOA
HAManager – How - Old Way – WAS V5
 Only way in WAS V5 – possible in V6 but complex
-Required IBM HACMP or Tivoli System Automation
-Shared Drive – Config Reporting, log files, Tx logs, WAS
binaries.
-IP Address – Each has its own IP, virtual IP for client access.
-HA Software (HACMP) manages group of IPs, disk, file
systems, start/stop scripts for WAS.
-On Failure – move IPs, disk, starting WAS
-Disadvantage
- Recovery slow, virtual IP on same subnet (local only)
- Complex

IBM SOA
HAManager – How- Old Way – WAS V5

IBM SOA
HAManager – Configuration Requirements
 Enablers – HAManager, HA File System, Lease-based locking Protocol
 1. Visible Translog – must be accessible to all members in core group.
 2. Platform – Highly Available File System – IBM SAN FS, NAS
 - Needs Lease based exclusive locking protocol
 - CIFS - Common internal file system
 - NFS V4
 3. R/W Access Rights - All App Servers must be able to read/write to the
logs before recovery can occur.
 4. Consequences - if not, locks held by processes on failed node will not be
automatically released.
 - Tx will not be completed, database potential impaired.
 - Peer servers can only recover in-flight tx if database locks are
released

IBM SOA
 Introduction
- SLAs
Agenda

IBM SOA
Services
atomic and composite
Operational Systems
Service Components
Consumers
Business Processes
process choreography
ServiceProviderServiceConsumer
SAP Custom
Application
OO
ApplicationISV
Custom Apps
Platform Supporting Middleware
MQ DB2Unix OS/390
Outlook
SCA Portlet WSRP B2B Other
Integrated Console
• Allow for seamless views
across different layers of
abstraction.
Service
Management
Application
Monitoring
Resource
Monitoring
Resource
Monitoring
Transaction
Tracking
Integrated Reporting
• Generate enterprise-
wide service level
reporting
SOA Management: Solution View
Business
Process
Management

IBM SOA
SOA Management: Example 2 – Digging out the CICS Data

IBM SOA
Example 1: Reuse: Service Creation: Digging Out the CICS Data

IBM SOA
SOA Management: Example 2 – Logical Architecture

IBM SOA
 “Contracts” are established between
service requestors and providers, also
known as Service Level Agreements
 Management focus turns to monitoring for
compliance to agreed upon service levels
 “Active” management optimizes systems
to avoid service violations
SOA Management: Service Levels
Service Provider
Service
Service
Requestor
Quality of
Service
Capacity
Security
Performance
XML
WSDL
SOAP
Service Level
Agreement
 # of requests
allowed
 Acceptable response
time
 Charge per request

IBM SOA
SOA Security: IBM Security Approach - MASS

IBM SOA
Custom
Application
Packaged
Application
Packaged
Application
Custom
Application
consumers
business processes
services
ServiceConsumerServiceProvider
11
22
33
44
55
OO
ApplicationCustom
ApplicationOutlook
SAP Custom
Application
business processes
Services (Definitions)
Service
components
ServiceConsumerServiceProvider
11
22
33
44
55
OO
Application
ISV
Custom Apps
Platform
Operational
systems Supporting Middleware
MQ DB2Unix OS/390
SOA Security: Encompass all Aspects of Security
SOA Security
 Identity
 Authentication
 Authorization
 Auditing
 Confidentiality,
Integrity and
Availability
 Auditing &
Compliance
 Administration and
Policy Management
SCA Portlet WSRP B2B Other

IBM SOA
 Introduction
- SLAs
Agenda

IBM SOA

Toronix - IBM WebSphere HA & High Availability Concepts

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Toronix - IBM WebSphere HA & High Availability Concepts

Similar to Toronix - IBM WebSphere HA & High Availability Concepts (20)

Recently uploaded

Recently uploaded (20)

Toronix - IBM WebSphere HA & High Availability Concepts