Disaster recovery strategies for ims

1,587 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,587
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
52
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Disaster recovery strategies for ims

  1. 1. IBM Software Group IMS Application Dependent & Mirroring Overview Disaster Recovery Solutions Glenn Galler IBM SW IT Specialist, ATS Ann Arbor, Michigan gallerg@us.ibm.com © 2008 IBM Corporation
  2. 2. Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* GDPS* S/390* CICS* HyperSwap Sysplex Timer* DB2* IBM* Tivoli* e-business logo* IBM eServer* TotalStorage* Enterprise Storage Server* IBM logo* z/OS* ESCON* NetView* z/VM* FICON OS/390* zSeries* FlashCopy* Parallel Sysplex* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Intel is a trademark of the Intel Corporation in the United States and other countries. Java and all Java-related trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries. Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation. SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC. UNIX is a registered trademark of The Open Group in the United States and other countries. * All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. This presentation and the claims outlined in it were reviewed for compliance with US law. Adaptations of these claims for use in other geographies must be reviewed by the local country counsel for compliance with local laws. 2
  3. 3. Acknowledgments • Peter Armstrong (BMC) • Book: “DBRC in Practice” • http://www.dbazine.com/ofinterest/oi-articles/armstrong6 • Technical Support • Rich Lewis (ATS) • Helene Lyon (IBM France) • DM Tools • Mitch Dooley and Lynne Bisceglia (IMS Recovery Expert) • Bob Magid (DM Tools Architect) 3
  4. 4. Agenda • Two Disaster Recovery Strategies for IMS • IMS Application Dependent DR • Storage Management Mirroring • IMS Application Dependent DR • Managing image copies, Recons, and Logs at a Remote Site • IBM DM Tools can assist with this DR strategy • Storage Management Mirroring • Production data is mirrored to Remote Site • Creating Consistency is the Key • Mirroring can be Asynchronous or Synchronous • GDPS can be used to automate DR strategy 4
  5. 5. Concepts • Disaster Recovery • Process of recovering a Production environment • Restore to point where business can be conducted • Recovery Time Objective (RTO) • Time allowed to recover the applications • All critical operations are up and running again • Recovery Point Objective (RPO) • Amount of data lost in the disaster • Last point-in-time when all data was current 5
  6. 6. IMS Application Dependent Disaster Recovery 6
  7. 7. Production Site RDS WADS SLDS Logger OLDS RLDS IMS Control Remote Region Change Accumulation Site DBRC RECON BACKUP RECON DLI/SAS Image Copies 7
  8. 8. Remote Site RDS WADS Image Copies Logger OLDS SLDS Production RLDS IMS Control Site Region Change Accumulation RECON DBRC BACKUP RECON DLI/SAS 8
  9. 9. 3 Application Dependent DR Strategies • #1: Recover the Databases to an Earlier Image Copy • Image Copies and Recon are shipped to remote site • Production activity is quiesced for image copy • Procedure may include IMS, DB2 and CICS • Data since last image copy is lost • RTO is low since recovery time is short • RPO is high since the log updates are lost Image Copies 9
  10. 10. 3 Application Dependent DR Strategies • #2: Recover the Databases to the Last Good Log Data Set • Image Copies, Recon, Logs are shipped to remote site • Forward and Backward recovery is performed with the logs • RTO is higher as recoveries are required • RPO is lower as log updates are applied to image copy SLDS Image Copies RLDS + Change Accumulation 10
  11. 11. 3 Application Dependent DR Strategies • #3: Recover the Databases to the IMS Recovery Expert RP • Image Copies, Recon, Logs are shipped to remote site • Forward recovery is performed to the RP with the logs • RTO is higher as recoveries are required • RPO is lower as log updates are applied to image copy • IBM Tools provide additional capability SLDS Image Copies RLDS + + RP Change Accumulation 11
  12. 12. Recover DBs to Earlier Image Copy 12
  13. 13. Recover DBs to Earlier Image Copy • Primary: Step 1: • Quiesce the IMS environment • Take Batch (Clean) Image Copies periodically • Ship Image Copies + Recon to Remote Site 13
  14. 14. Recover DBs to Earlier Image Copy • Remote: Step 2: Clean Backup Recon • Active Subsystems • If Recon shows active SUBSYSTEMs • Use LIST.SUBSYS to show active subsystems • Issue DBRC Commands: • CHANGE.SUBSYS SSID(ssidname) ABNORMAL • CHANGE.SUBSYS SSID(ssidname) STARTRCV • CHANGE.SUBSYS SSID(ssidname) ENDRECOV • DELETE.SUBSYS SSID(ssidname) • Secondary Image Copies • If Image Copy at Remote site is a Secondary Image Copy • Flag the Primary Image Copy in the Recon as Invalid • Fastpath DEDBs • Flag DEDBs AREAs as Recovery Needed to enable recoveries 14
  15. 15. Recover DBs to Earlier Image Copy • Remote: Step 3: Recover DBs to Clean IC • Use Cleaned Up Recon to for GENJCL.RECOV JCL • Recover database data sets: • Standard IMS Recovery Utility (DFSURDB0) • Or, IMS Database Recovery Facility (DRF) • HPIC Incremental Image Copies • Must be recovered with IMS Database Recovery Facility (DRF) • HPIC COPY Image Copies • Exact copy of database, no recovery is needed • Ensure Database Data Set name matches Image Copies Database Recovery Utility or Tools 15
  16. 16. Recover DBs to Earlier Image Copy • Remote: Step 4: Cold Start IMS Systems • Databases are consistent with earlier Image Copy 16
  17. 17. Recover DBs to Last Good Log 17
  18. 18. Recover DBs to Last Good Log • Primary: Step 1: • Take Batch (Clean) or CIC (Fuzzy) Image Copies • Optionally take Change Accumulations periodically • Ship ICs + Recon + SLDS/RLDS + CAs to Remote Site 18
  19. 19. Recover DBs to Last Good Log • Remote: Step 2A: Clean Backup Recon • To use DBRC Recon for Recovery •No Databases can be Allocated on an OPEN log • Close and Archive OLDS in Recon data •When OLDS are not at the Remote Site •All PRILOG, PRISLD, PRIOLD need non-zero Stop Times • Two Methods: 1.Add dummy SLDS log entry and Start Archive 2.Close & flag INUSE OLDS as Archive Needed & Archive 19
  20. 20. Recover DBs to Last Good Log • Remote: Step 2B: Clean the Backup Recon • Active Subsystems • If Recon shows active SUBSYSTEMs • Use LIST.SUBSYS to show active subsystems • Issue DBRC Commands: • CHANGE.SUBSYS ABNORMAL • CHANGE.SUBSYS STARTRCV • CHANGE.SUBSYS ENDRECOV • DELETE.SUBSYS • Secondary Image Copies • If Image Copy at Remote site is a Secondary Image Copy • Flag the Primary Image Copy in the Recon as Invalid • Fastpath DEDBs • Flag DEDBs AREAs as Recovery Needed to enable recoveries 20
  21. 21. Recover DBs to Last Good Log • Remote: Step 2C: Clean the Backup Recon • If copies of CA data sets are at remote site • Recon will show CA data sets from Production site • Use CHANGE.CA to point to correct copy of CA • If CA data sets are unavailable and exist in Recon • Flag the Recon to show CA is INVALID • DBRC will use the logs instead of the CAs 21
  22. 22. Recover DBs to Last Good Log • Remote: Step 3: Recover DBs from ICs, Logs, CA • Use Clean Up Recon to GENJCL.RECOV JCL • Recover DBs from ICs + CAs + SLDS/RLDS • Standard IMS Recovery Utility (DFSURDB0) • Or, IMS Database Recovery Facility (DRF) • HPIC Incremental ICs • Use IMS Database Recovery Facility (DRF) SLDS RECON RLDS + + Image Copies Change Accumulation 22
  23. 23. Recover Database to Last Good Log • Remote: Step 4: Batch Backout • Following Full Database Recovery • Backout Inflight UOWs • Cold Start or ERE COLDSYS or ERE COLDBASE • Following Timestamp (RP) or Point-In-Time Recovery • No Inflight UOWs to backout • Cold Start or ERE 23
  24. 24. Recover Databases to Last Good Log • Remote: Step 5: Cold Start or /ERE from SLDS • If /ERE From SLDS • Ensure SUBSYS records have been cleaned up • Online IMS SUBSYS record should not be deleted • IMS dynamically allocates SLDS • OLDS were archived for GENJCL.RECOV 24
  25. 25. Recover DBs to IMS Recovery Expert RP 25
  26. 26. Recover DBs to IMS Recovey Expert RP • Primary: Step 1: Create Backup Datasets • Take Batch (Clean) or CIC (Fuzzy) Image Copies • Optionally take Change Accumulations periodically • Create IMS Recovery Expert RPs periodically • Create and Clean Backup Recon • Ship ICs + Clean Backup Recon + SLDS/RLDS + CAs 26
  27. 27. Recover DBs to IMS Recovey Expert RP • Primary: Step 2: Clean Backup RECON • IMS Recovery Expert Recon Clean Up (RCU) •Cleanup Timestamp • Last DEALLOC or Stop Time of SLDS/RLDS •Closes open PRILOG, PRIOLD and SECSLD records •Deletes PRIOLD, SECOLD and SUBSYS records •Updates or deletes ALLOC and LOGALL records •Deletes IC and CA records past the Clean Up Time • i.e… Automates the Recon clean up process 27
  28. 28. Recover DBs to IMS Recovey Expert RP • Remote: Step 3: Recover DBs • Use Cleaned Up Recon to GENJCL.RECOV JCL • Recover DBs from ICs + CAs + SLDS/RLDS SLDS RECON RLDS + + + RP Image Copies Change Accumulation 28
  29. 29. Recover DBs to IMS Recovey Expert RP • Step 4: Cold Start IMS • IMS RCU deletes all IMS Subsystem Records 29
  30. 30. Storage Management Mirroring Disaster Recovery 30
  31. 31. Storage Management Mirroring DR • Mirroring DR Solutions • All Production volumes are mirrored to Remote Site • Mirroring can be Synchronous or Asynchronous • Or, combination of two strategies • IMS can be Cold Started or Emergency Restarted • Backouts occur during Emergency Restart • Consistency of Data is the key to mirroring • IBM Graphically Dispersed Parallel Sysplex (GDPS) • Optional, helps automate the DR solution 31
  32. 32. Storage Management Mirroring DR • Geographically Dispersed Parallel Sysplex (GDPS) • Manages: • IBM Metro Mirror (PPRC) • IBM Global Mirror • IBM z/OS Global Mirror (XRC) • Controls remote copy configuration and storage subsystem • Provides automation of sysplex operational tasks • Independent of applications like IMS and DB2 • Includes IBM Services for configuration and manageability • Optional for Mirroring Solutions 32
  33. 33. Consistency of Data: Dependent Writes • Committed Database Update LOG Database LOG (1) Log “Before Image” (2) Update Database (3) Log “After Image” Good Sequence of Writes Bad Sequence of Writes • (1) • (1) and (3) only • (1) and (2) • (1), (2) and (3) 33
  34. 34. Mirroring Environments 34
  35. 35. GDPS/PPRC… IBM Metro Mirror Site 1 12 1 11 10 2 9 3 – Synchronous DR Solution 8 4 7 5 6 CF1 GDPS K-sys P1 P2 – RPO is zero K1 – Emergency restart of IMS Open A – Automation and Freeze policy Metro Mirror – Duplexing of CF Structures okay Site 2 – Distances up to 300 km (with RPQ) 12 1 11 10 2 Open 9 3 8 4 7 5 6 B K2 CF2 GDPS K-sys CBU 35
  36. 36. GDPS/PPRC: Consistency • (1) Consistency Group (CG) • Set of volumes hold IMS datasets and logs • When failure occurs for a volume in CG at Remote Site • All writes in CG are held for period of time • (2) GDPS Freeze Automation • FREEZE and GO: • Writes continue at Primary even if failing at Secondary • FREEZE and STOP: • Writes are frozen at Primary and Secondary 36
  37. 37. GDPS/Global Mirror Site 1 Site 2 12 1 12 1 11 11 10 2 10 2 9 3 9 3 8 4 8 4 7 5 7 5 6 6 GDPS CF1 K-sys P1 P2 Global Mirror Required B F FlashCopy K1 A GDPS CF2 R-sys P1 P2 Open R Backups Open – Asynchronous DR Solution – RPO can be 3 – 5 seconds (dependent on bandwidth) – RTO - Emergency restart of IMS – Local site response times are negligible – Two site long distance DR and backup remote copy solution 37
  38. 38. GDPS/Global Mirror: Consistency − (1) Primary creates Out-of-Sync Bit Maps: − Shows tracks with new data − (2) Consistency Groups − FlashCopy is required for IBM Global Mirror − FlashCopy is taken before changes are applied − Change Recording bitmap from Out-of-Sync bitmap − Tracks in Change Recording bitmap are Updated 38
  39. 39. GDPS/XRC… z/OS Global Mirror Site 1 Site 2 12 1 12 1 11 11 10 2 10 2 9 3 9 3 8 4 8 4 7 5 7 5 6 6 GDPS CF1 K-sys P1 P2 Optional z/OS B F FlashCopy K1 Global Mirror A CF2 SDM Kx P1 P2 SDM Kx – Asynchronous DR Solution – RPO can be 3 – 5 seconds (dependent on bandwidth) – RTO - Emergency restart of IMS – Scales to any amount of DASD/distance – z/OS Solution 39
  40. 40. GDPS/XRC: Consistency − (1) Primary writes are timestamped − Timestamps are stored in Side File cache − (2) System Data Mover (SDM) − Global Mirror uses SDM to “pull” data to Secondary − SDM uses timestamps in Side File to order writes − Consistency Group is journaled 40
  41. 41. GDPS/PPRC -- Global Mirror Site 3 Site 1 12 1 12 1 11 11 10 2 10 2 9 3 9 3 8 4 8 4 7 5 7 5 6 6 CF1 GDPS P1 P2 K-sys Global Mirror C K1 GDPS CF3 R-sys P1 P2 Open A R Backups Metro Mirror Site 2 10 9 8 11 7 12 6 1 5 2 4 3 Open – Three site mirroring solution: B –Metro Mirror for Site 1 & 2 –Global Mirror for site 2 & 3 K2 – Site 2 RPO near 0 CF2 GDPS CBU K-sys – Site 3 RPO 3-5 secs if site 1 & 2 lost 41
  42. 42. GDPS/PPRC – XRC Site 3 12 Site 1 11 1 12 1 10 2 11 10 2 9 3 9 3 8 4 7 5 8 4 6 7 5 6 CF1 GDPS K-sys P1 P2 B z/OS K1 Global Mirror SDM Kx P1 P2 CF3 Open A SDM Kx Metro Mirror Site 2 10 9 8 11 7 12 6 1 5 2 4 3 Open – Three site mirroring solution: B –Metro Mirror for Site 1 & 2 –z/OS Global Mirror for site 2 & 3 K2 – Site 2 RPO near 0 CF2 GDPS CBU K-sys – Site 3 RPO 3-5 secs if site 1 & 2 lost 42
  43. 43. Coupling Facility Structures • GDPS/PPRC • Primary and Secondary distance allows for Duplexing • GDPS/Global Mirror, GDPS/XRC • Distance between sites usually prevents Duplexing • All CF structures are allocated during Emergency Restart 43
  44. 44. GDPS Metro Mirror: IMS CF Structures • No Duplexing Needed: • OSAM and VSAM Buffer Pools • Stored on DASD when data is committed • Secondary Site: • All buffers are invalid and structure is rebuilt • IRLM Locks • Secondary site: • Restart backs out inflight trans to release locks • IRLM rebuilds lock structure as empty 44
  45. 45. GDPS Metro Mirror: IMS CF Structures • Good Candidate for Duplexing: • Shared Queues (MSGQ and EMHQ) • VTAM Generic Resources • Structure changes infrequently • Rebuilding can take long time and users must wait • Shared VSO • Store-In Structure: • Committed updates on CF and not on DASD • Recover FP Areas after restart (if not duplexed) • Two Duplexing options: • IMS Managed: IMS creates 2 structures/Area • System Managed (IMS V9+): Multiple Areas per structure 45
  46. 46. Summary • Two Disaster Recovery Strategies for IMS • IMS Application Dependent DR • Storage Management Mirroring • IMS Application Dependent DR • Managing image copies, Recons, Logs at a Remote Site • IBM DM Tools can assist with this DR strategy • Storage Management Mirroring • Production data is mirrored to Remote Site • Creating Consistency is the Key • Mirroring can be Asynchronous or Synchronous • GDPS can be used to automate DR strategy 46

×