A05 High Availability and Disaster Recovery   for IBM PureApplication System    Kyle Brown | Distinguished Engineer | IBM ...
Agenda• DR and HA Definitions• HA within the IPAS rack• HA Approaches for WAS IPAS• DB2 HADR in IPAS• MQ HA considerations...
Executive Summary• High Availability and Disaster Recovery in  PureApplication System is accomplished using the  tools and...
DEFINITIONS              © IBM Corporation 2012
Definitions•       High availability (HA)         • Application cannot undergo an unplanned outage for more than a few    ...
Definitions Recovery Point Objective (RPO)     maximum tolerable period in which data might be lost from an IT service due...
HIGH AVAILABILITY FEATURES INIPAS                    © IBM Corporation 2012
High Availability in IPAS   • PureApplication System has intra                                intra-rack HA by design     ...
Hardware high availability•       Management nodes and Virtualization mgmt node          • Both have redundant backup serv...
Virtual applications – WebSphere App. Server failover and recovery •   WebSphere Application Server as part of virtual app...
Virtual systems – WebSphere App. Server failover and recovery •   This is similar to traditional WebSphere Application Ser...
Virtual applications – DBaaS failover and recovery •   DBaaS (DB2) VM as part of virtual application is considered persist...
WEBSPHERE HA IN IPAS                   © IBM Corporation 2012
Load BalancerWAS HA – intra site                                                               Active (primary)           ...
Creating your patterns                                     Step 1: Create a                                     Clustered ...
Deploying your patterns   Step 3: Deploy Cluster Pattern on IPAS A                          • Note the Deployment Managers...
Pros and Cons of the Single-Cell approach                            Cell • The single-cell approach has both advantages a...
Load BalancerWAS HA – cross site                                                               Active (primary)           ...
Exporting and Importing the pattern                         Active (primary)                    Active (secondary)        ...
Deployment through Normal                                                                  OperationPrimary               ...
DB2 HADR           © IBM Corporation 2012
Virtual systems – DB2 failover and recovery  This is similar to traditional DB2 failure scenarios and handled like normal ...
A simple DB2 HADR Topology for two IPAS racks      Rack A                                 Rack B     DB2 HADR             ...
Creating the Standby DB2 HADR                                       1. On the Secondary system,                           ...
Creating and configuring the Primary DB2 HADR                                   1. On the Primary system, create          ...
MQ HIGH AVAILABILITY                   © IBM Corporation 2012
Achieving MQ High Availability in IPAS  • MQ Multi-instance is a feature added in MQ 7.0 to provide               instance...
Highly Available MQ and Message Broker Configuration                                                                      ...
Details of MQ Multi Instance Deployment•   Images    •    WebSphere Message Broker Hypervisor Edition 8.0.0.1    •    WebS...
DISASTER RECOVERY                    © IBM Corporation 2012
Disaster Recovery  • Business critical applications that require DR can be hosted on    PureApplication systems  • Differe...
Manual Network ChangeDR basics                                  Active (primary)                        Standby (backup) •...
DR solution ranges                                                Backup and Restore                             File Repl...
Manual Network ChangeDR – Backup and Restore                                                                              ...
Manual Network ChangeDR – File Replication                                                                                ...
Manual Network ChangeDR – Shared File System                                                                              ...
SUMMARY          © IBM Corporation 2012
We’ve seen• What DR and HA are• IPAS’s HA features• HA Approaches for WAS IPAS• DB2 HADR in IPAS• MQ HA considerations in ...
IBM WebSphere Technical Convention 2012 – Berlin, GermanyQuestions?As a reminder, please fill out a session evaluation    ...
IBM WebSphere Technical Convention 2012 – Berlin, GermanyCopyright Information© Copyright IBM Corporation 2012. All Rights...
Upcoming SlideShare
Loading in …5
×

A05

349 views

Published on

High Availability and Disaster Recovery in PureApplication System

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
349
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A05

  1. 1. A05 High Availability and Disaster Recovery for IBM PureApplication System Kyle Brown | Distinguished Engineer | IBM © IBM Corporation 2012© IBM Corporation 2011
  2. 2. Agenda• DR and HA Definitions• HA within the IPAS rack• HA Approaches for WAS IPAS• DB2 HADR in IPAS• MQ HA considerations in IPAS• DR strategies for IPAS © IBM Corporation 2012
  3. 3. Executive Summary• High Availability and Disaster Recovery in PureApplication System is accomplished using the tools and procedures that most customers are already familiar with• What changes is the automation aspects that can actually make the process more repeatable and easier for many customers © IBM Corporation 2012
  4. 4. DEFINITIONS © IBM Corporation 2012
  5. 5. Definitions• High availability (HA) • Application cannot undergo an unplanned outage for more than a few seconds/minutes at a time, but can do so as often as necessary, or may be down for a few hours for scheduled maintenance. • Short of Continuous availability which does not allow for any outage • two scenarios: Inter-rack, Intra-rack• Disaster recovery (DR) • The reconstruction of your physical production site in an alternate physical site, occurring after the loss of your primary data center. • The process of bringing up servers and applications, in priority order, to support the business from the alternate site. • This environment may be substantially smaller than the entire production environment, as only a subset of production applications demand DR © IBM Corporation 2012 5 IBM PureApplication System - Failover and Recovery
  6. 6. Definitions Recovery Point Objective (RPO) maximum tolerable period in which data might be lost from an IT service due to a major incident Recovery Time Objective (RTO) duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. Accepted service tiers for business applications Tier 1: Continuous availability required, no loss of data tolerated Tier 2: A few minutes of outage tolerated. Data restorable in time Tier 3: A few hours of outage. Some in flight transaction loss is acceptable Tier 4: Application could be unavailable for a day or more © IBM Corporation 2012
  7. 7. HIGH AVAILABILITY FEATURES INIPAS © IBM Corporation 2012
  8. 8. High Availability in IPAS • PureApplication System has intra intra-rack HA by design • Redundant hardware, networking and storage • No single points of failure • Recovery from hardware failures with workload mobility • Additional capacity can be added & utilized without any service interruption • Additional local racks can increase HA further • WebSphere clustering across racks can be configured • MQ and other patterns can also be clustered similarly © IBM Corporation 2012
  9. 9. Hardware high availability• Management nodes and Virtualization mgmt node • Both have redundant backup servers • Floating IP address is assigned to the active Management Node that has management (workload deployer) function• Network controllers • Switches and cabling are redundant (2 of each). • Failure of 1 leads to reduced bandwidth, not service• Storage controllers • Each controller has two canisters that can service all traffic needed • If one fails, the other handles all I/O• Storage • SSD and HDD storage is configured in RAID5 + spares • Tolerates 2 concurrent drive failures without data loss (after spares are in use)• Compute nodes • Management system will route around failed DIMMs or cores • If entire node fails, PureApplication System will try to move the VM to another compute node within its Cloud Group if free space is available – this is called “Rebalance or Workload evacuation” • If limited resources available on other nodes to handle the extra load, VMs are started Corporation 2012 © IBM based on their priorities 9 IBM PureApplication System - Failover and Recovery
  10. 10. Virtual applications – WebSphere App. Server failover and recovery • WebSphere Application Server as part of virtual application deployment is considered non non-persistent (should not have any state in the VM) • Hence in case of VM failure, PureApplication System can re re-create another VM with no side effect Type of failure What happens Deployer Action PureApplication System will monitor WebSphere process failures and will restart None needed for recovery.WebSphere App. Server WebSphere once a day (to prevent spinning Deployer needs to follow-up onfailure within VM (VM is of middleware) this scenario to understand thestill running) If scaling is enabled, PureApplication System cause of failure using SSH, logs, will start another instance if SLA is not and so on. satisfied PureApplication System detects VM failure and will re-spin another VM, assigning a new spin IP address (WebSphere in virtual applicationVM containing is to be non-persistent) None needed – this is handled byWebSphere fails PureApplication System If scaling, routing or caching policies are enabled, PureApplication System will link the instance to the appropriate Shared service © IBM Corporation 2012 1 IBM PureApplication System - Failover and Recovery
  11. 11. Virtual systems – WebSphere App. Server failover and recovery • This is similar to traditional WebSphere Application Server failure scenarios and handled like normal case - PureApplication System does not take any special action Type of failure What happens Deployer Action PureApplication System does not monitor WebSphere process failure and will not Deployer needs to look into restart WebSphere WebSphere logs. If WebSphere application server node in Deployer can log into VM fromWebSphere process Network Deployment topology goes down, PureApplication System consolefailure within VM the Node agent will restart WebSphere and restart WebSphere as node – normal WebSphere function needed If Node Agent or Deployment Manager Look at Monitoring data fails, follow normal WebSphere debug procedure Try restarting VMVM containing PureApplication System detects VM failure Add another clone for VirtualWebSphere fails but there is no VM recovery Machine instance that represents WebSphere node © IBM Corporation 2012 1 IBM PureApplication System - Failover and Recovery
  12. 12. Virtual applications – DBaaS failover and recovery • DBaaS (DB2) VM as part of virtual application is considered persistent • Hence in case of failure, PureApplication System will not create another Virtual Machine Type of failure What happens Deployer Action Need to look into DB2 logs to identify failure DB2 failure within PureApplication System does not VM – VM is still monitor DB2 failures and hence does SSH into the VM and start running not attempt to restart DB2 DB2 Restart VM If VM does not come up after 1 retry, deployer will need to PureApplication System detects VM create a new DBaaS VM containing DB2 failure and will restart DB2 VM once instance (create a new VM) fails (only) at the same IP address (since and then restore DB2 data DB2 is considered persistent) from TSM backup (hence backup is essential) © IBM Corporation 20121
  13. 13. WEBSPHERE HA IN IPAS © IBM Corporation 2012
  14. 14. Load BalancerWAS HA – intra site Active (primary) Active (secondary) Virtual Appliance HTTP Server Deploy Operating system Metadata Pattern 1 Virtual Appliance Virtual Appliance Application Application Server Server Operating Operating system system Metadata Metadata Deployment manager and application server nodes Deploy Virtual Appliance Application Server OperatingVirtual Appliance system Application Server Operating Pattern 2 Metadata system Metadata Additional application server nodes federated into Dmgr on Primary Shared Database (accessed from both sites) © IBM Corporation 2012
  15. 15. Creating your patterns Step 1: Create a Clustered Pattern on IPAS A Step 2: Create a Custom Node Pattern on IPAS B Make sure that the Federate node drop down is selected © IBM Corporation 2012
  16. 16. Deploying your patterns Step 3: Deploy Cluster Pattern on IPAS A • Note the Deployment Managers’s Hostname. Step 4: Deploy The Custom Node Pattern • Make sure the Deployment Manager Hostname under Custom Nodes properties matches the hostname from the previous step. © IBM Corporation 2012
  17. 17. Pros and Cons of the Single-Cell approach Cell • The single-cell approach has both advantages and disadvantages, but the cell disadvantages are enough that we do not recommend it for cross cross-site use • Advantages • Single point of administration • Only need to perform administration actions once • Disadvantages • Single point of failure (Dmgr) • System will continue to run, although you can not make any administration changes • Susceptible to “split brain” issues when network connectivity between racks is cut • Core groups in particular are susceptible to potentially long split and reconnect times in large cells © IBM Corporation 2012
  18. 18. Load BalancerWAS HA – cross site Active (primary) Active (secondary) Virtual Appliance HTTP Server Deploy Operating system Metadata Pattern 1 Virtual Appliance Virtual Appliance Application Application Server Server Operating Operating system system Metadata Metadata Deployment manager server and application servers Virtual Appliance HTTP Server Deploy Operating system Metadata Pattern 1 Virtual Appliance Virtual Appliance Application Application Server Server Operating Operating system system Metadata Metadata Clone of Pattern 1 (Export/Import) Shared Database (accessed from both sites) © IBM Corporation 2012
  19. 19. Exporting and Importing the pattern Active (primary) Active (secondary) SSH Storage Server – Patterns with Export Import Images, Scripts, Add ons • Recommended to setup an external ssh storage server accessible from both the racks for Export/Import tasks • Server should have enough space to host patterns, images and scripts during the export/import tasks • In a typical environment, 100 GB space should be sufficient, though it can change based on the size of images you are using, and how frequently you are transferring patterns and images • Download and install the CLI from IBM Pure Application System, and configure it work with the racks © IBM Corporation 2012
  20. 20. Deployment through Normal OperationPrimary Load Balancer Secondary a 5. Routing rules manually configured Http Server Http Server for Http server addresses & ports Http Server (must be done after each pattern deployment) Http Server WAS Node WAS Node Shared Database DMgr DMgr Virtual Appliance Virtual Appliance HTTP HTTP Server Server Virtual Appliance HTTP Server Operating Operating systemsystem Operating system Metadata Metadata Metadata Virtual Appliance Virtual Appliance Virtual Appliance Virtual Appliance Application Application Application Application Virtual Appliance Virtual Appliance Server Server Server Server Application Application Server Server Operating Operating Operating Operating systemsystem systemsystem Operating Operating system system Metadata Metadata Metadata Metadata Metadata Metadata1. WAS cluster2. CLI runs script to 3. CLI runs script topattern deploys on 4. WAS clusterexport cluster pattern Off rack storage import cluster patternprimary pattern deploys on secondary rack © IBM Corporation 2012
  21. 21. DB2 HADR © IBM Corporation 2012
  22. 22. Virtual systems – DB2 failover and recovery This is similar to traditional DB2 failure scenarios and handled like normal case - PureApplication System does not take any special action Type of failure What happens Deployer Action Client needs to view DB2 logs, log into the VM to fix or DB2 failure within PureApplication System does not restart DB2 VM – VM is still monitor DB2 failures and hence does Follow normal best practice running. not attempt to restart DB2 for DB2 recovery Use DB2 HADR Client will need to look into logs and determine the PureApplication System detects VM failure and restart of VM or VM containing DB2 failure, but there is no restart of VM in create new VM. fails virtual system Standard DB2 backup/restore will need to be implemented © IBM Corporation 2012 2
  23. 23. A simple DB2 HADR Topology for two IPAS racks Rack A Rack B DB2 HADR DB2 HADR Primary Secondary © IBM Corporation 2012
  24. 24. Creating the Standby DB2 HADR 1. On the Secondary system, create a virtual system using the "DB2 Enterprise HADR Standby" part 2. Deploy the virtual system 3. View the deployed pattern and write down the information about the standby such as password for root, port, and host (Network interface 1: ipas ipas-lpar-008- 020.purescale.raleigh.ibm.com (172.17.8.20)) © IBM Corporation 2012
  25. 25. Creating and configuring the Primary DB2 HADR 1. On the Primary system, create a virtual system using the "DB2 Enterprise HADR Primary" part 2. Deploy the virtual system 3. Configure the virtual part; Enter the hostname you wrote down for the standby hostname field, the standby root password, and the standby DB2 service port. © IBM Corporation 2012
  26. 26. MQ HIGH AVAILABILITY © IBM Corporation 2012
  27. 27. Achieving MQ High Availability in IPAS • MQ Multi-instance is a feature added in MQ 7.0 to provide instance basic failover support • Can be used in situations like IPAS where an HA coordinator is not a good option • Queue manager can be started on different machines • Active instance • Standby instance • Shared Queue Manager Data retained on network storage © IBM Corporation 2012
  28. 28. Highly Available MQ and Message Broker Configuration • Two Message Brokers are <<node>> bk01.ipas.ibm.com <<node>> bk02.ipas ipas.ibm.com configured in Active-Active HA Active- pattern on nodes: bk01.ipas.ibm.com R CLUS.BK R CLUS.BK CLUS bk02.ipas.ibm.com <<qmgr>> QM.BK.01 <<qmgr>> QM.BK.02 QM • The Queue Managers MIQM hosting the Brokers are <<broker>> BK.01 <<broker>> BK.02 clustered for load balancing of MIQM messages CLUS.BK • The Queue Managers and the Broker components are <<node>> nfs.ipas.ibm.com shared from the NFS 4 node: nfs.ipas.ibm.com FS /opt/MBShare • Multi-Instance Queue Multi- /opt/MQShare Manager (MIQM) technology FS detects and fails over the Broker and Queue Manager Note that in a real customer environment, the shared storage would be held outside the IPAS box © IBM Corporation 2012
  29. 29. Details of MQ Multi Instance Deployment• Images • WebSphere Message Broker Hypervisor Edition 8.0.0.1 • WebSphere MQ Hypervisor Edition 7.5.0.1 (when using only MQ)• Parts: • WebSphere Message Broker Advanced • WebSphere MQ – Basic (when using only MQ) • Core OS (RHEL part for NFS Server)• Scripts: • pasconn.addgroup – Create or Modify group with specified group id • pasconn.adduser – Create or Modify user with the specified user id • pasconn.delmb – Delete default Message Broker on the node • pasconn.nfsshare – Create and Share the NFS directory • pasconn.nfsmount – Create and Mount the shared NFS directory • pasconn.hamb – Create a MIQM Queue Manager and Broker Primary and Standby on remote node • pasconn.clusqmgr – Create Cluster Repository on Queue Manager and join the other Repository © IBM Corporation 2012
  30. 30. DISASTER RECOVERY © IBM Corporation 2012
  31. 31. Disaster Recovery • Business critical applications that require DR can be hosted on PureApplication systems • Different techniques are available to cope with Tier 1, 2 and lower tier requirements • Specific scripting and configuration will be necessary for each product in the portfolio • DR details are different for WAS, MQ, DB2, etc. • PureApplication integrates with existing client HA/DR capabilities in their environments • Network load balancing and failover • Database architectures • Existing backup/recovery tools © IBM Corporation 2012
  32. 32. Manual Network ChangeDR basics Active (primary) Standby (backup) • DR in PureApplication System copy consists of three basic steps **Management • Replicate the management Data data • Replicate the application state • Redirect network traffic from *Application the primary to the backup State • What distinguishes each tier is the toleration for loss for the application data; that dictates different approaches that meet different levels of RPO * Application State includes messages in queues, Database contents, Transaction logs, etc. ** Management data includes pattern definitions, User ids, cloud group definitions, etc. © IBM Corporation 2012
  33. 33. DR solution ranges Backup and Restore File Replication Shared File Systems RPO time Zero Seconds Minutes Hours Days © IBM Corporation 2012
  34. 34. Manual Network ChangeDR – Backup and Restore Active (primary) Cold Standby (secondary) Applies to Tier 3 & 4 Restore and Restart Deploy Copy Virtual Appliance HTTP Server Deploy Operating system Metadata of Pattern Pattern Virtual Appliance Virtual Appliance Application Application Server Server Operating Operating system system Metadata Metadata Backup Restore • Deploy patterns on both Offsite systems (Export/Import) Storage • Backup and restore critical application state using backup Application solutions like Tivoli Storage State Manager on a regular basis Offsite backup storage is a standard best practice © IBM Corporation 2012
  35. 35. Manual Network ChangeDR – File Replication Active (primary) Warm Standby (secondary) Applies to Tier 2 & 3 Deploy Copy Virtual Appliance HTTP Server Deploy Operating system Metadata of Pattern Deploy & Quiesce Pattern Virtual Appliance Virtual Appliance Application Application Server Server Operating Operating system system Metadata Metadata • Deploy patterns on both systems (Export/Import) Application State copy • Install a File Synchronization solution on both racks, actively copying selected data from primary to secondary • Use existing replication solutions like DB2 HADR where available on or off PAS Database Database Replica © IBM Corporation 2012
  36. 36. Manual Network ChangeDR – Shared File System Active (primary) Warmer Standby (secondary) Applies to Tier 1 Deploy Copy Virtual Appliance HTTP Server Deploy Middleware stopped Operating system Metadata of Pattern Pattern Virtual Appliance Virtual Appliance Application Application Server Server Operating Operating system system Metadata Metadata • Develop custom patterns utilizing shared file system, deploy on both racks, stop on the backup Shared file • On failure, change over network system and start middleware on standby server Application State Database Database Replica © IBM Corporation 2012
  37. 37. SUMMARY © IBM Corporation 2012
  38. 38. We’ve seen• What DR and HA are• IPAS’s HA features• HA Approaches for WAS IPAS• DB2 HADR in IPAS• MQ HA considerations in IPAS• DR strategies for IPAS © IBM Corporation 2012
  39. 39. IBM WebSphere Technical Convention 2012 – Berlin, GermanyQuestions?As a reminder, please fill out a session evaluation © IBM Corporation 2012
  40. 40. IBM WebSphere Technical Convention 2012 – Berlin, GermanyCopyright Information© Copyright IBM Corporation 2012. All Rights Reserved. IBM, the IBM logo, ibm.com, AppScan, CICS, Cloudburst, Cognos, CPLEX, DataPower, DB2, FileNet, ILOG, IMS, InfoSphere, Lotus, Lotus Notes, Maximo, Quickr, Rational, Rational Team Concert, Sametime, Tivoli, WebSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on th Web at “Copyright and trademark information” at the ibm.com/legal/copytrade.shtml.Coremetrics is a trademark or registered trademark of Coremetrics, Inc., an IBM Company.SPSS is a trademark or registered trademark of SPSS, Inc. (or its affiliates), an IBM Company.Unica is a trademark or registered trademark of Unica Corporation, an IBM Company.Java and all Java-based trademarks and logos are trademarks of Oracle and/or its affiliates. Other based company, product and service names may be tra trademarks or service marks of others. References in this publication to IBM products and services do not imply that IBM intends to make them available in all countries in which IBM operates. © IBM Corporation 2012

×