• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
High availability scenarios with ibm tivoli workload scheduler and ibm tivoli framework sg246632
 

High availability scenarios with ibm tivoli workload scheduler and ibm tivoli framework sg246632

on

  • 1,052 views

 

Statistics

Views

Total Views
1,052
Views on SlideShare
1,052
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    High availability scenarios with ibm tivoli workload scheduler and ibm tivoli framework sg246632 High availability scenarios with ibm tivoli workload scheduler and ibm tivoli framework sg246632 Document Transcript

    • Front coverHigh Availability Scenarioswith IBM Tivoli WorkloadScheduler andIBM Tivoli FrameworkImplementing high availability for ITWSand Tivoli FrameworkWindows 2000 Cluster Serviceand HACMP scenariosBest practices and tips Vasfi Gucer Satoko Egawa David Oswald Geoff Pusey John Webb Anthony Yenibm.com/redbooks
    • International Technical Support OrganizationHigh Availability Scenarios with IBM TivoliWorkload Scheduler and IBM Tivoli FrameworkMarch 2004 SG24-6632-00
    • Note: Before using this information and the product it supports, read the information in “Notices” on page vii.First Edition (March 2004)This edition applies to IBM Tivoli Workload Scheduler Version 8.2, IBM Tivoli ManagementFramework Version 4.1.© Copyright International Business Machines Corporation 2004. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.
    • Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 IBM Tivoli Workload Scheduler architectural overview . . . . . . . . . . . . . . . . 2 1.2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework . 4 1.3 High availability terminology used in this book . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Overview of clustering technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.1 High availability versus fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.2 Server versus job availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.3 Standby versus takeover configurations . . . . . . . . . . . . . . . . . . . . . . 12 1.4.4 IBM HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.5 Microsoft Cluster Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 When to implement IBM Tivoli Workload Scheduler high availability . . . . 24 1.5.1 High availability solutions versus Backup Domain Manager . . . . . . . 24 1.5.2 Hardware failures to plan for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.6 Material covered in this book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 2. High level design and architecture . . . . . . . . . . . . . . . . . . . . . . 31 2.1 Concepts of high availability clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.1.1 A bird’s-eye view of high availability clusters . . . . . . . . . . . . . . . . . . 32 2.1.2 Software considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.1.3 Hardware considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Hardware configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.1 Types of hardware cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.2 Hot standby system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3 Software configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.1 Configurations for implementing IBM Tivoli Workload Scheduler in a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.2 Software availability within IBM Tivoli Workload Scheduler . . . . . . . 57 2.3.3 Load balancing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.3.4 Job recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60© Copyright IBM Corp. 2004. All rights reserved. iii
    • Chapter 3. High availability cluster implementation . . . . . . . . . . . . . . . . . 63 3.1 Our high availability cluster scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1.1 Mutual takeover for IBM Tivoli Workload Scheduler . . . . . . . . . . . . . 64 3.1.2 Hot standby for IBM Tivoli Management Framework . . . . . . . . . . . . 66 3.2 Implementing an HACMP cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.1 HACMP hardware considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.2 HACMP software considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.3 Planning and designing an HACMP cluster . . . . . . . . . . . . . . . . . . . 67 3.2.4 Installing HACMP 5.1 on AIX 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.3 Implementing a Microsoft Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 3.3.1 Microsoft Cluster hardware considerations . . . . . . . . . . . . . . . . . . . 139 3.3.2 Planning and designing a Microsoft Cluster installation . . . . . . . . . 139 3.3.3 Microsoft Cluster Service installation . . . . . . . . . . . . . . . . . . . . . . . 141 Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 183 4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster . . . 184 4.1.1 IBM Tivoli Workload Scheduler implementation overview . . . . . . . 184 4.1.2 Preparing to install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 4.1.3 Installing the IBM Tivoli Workload Scheduler engine . . . . . . . . . . . 191 4.1.4 Configuring the IBM Tivoli Workload Scheduler engine . . . . . . . . . 192 4.1.5 Installing IBM Tivoli Workload Scheduler Connector . . . . . . . . . . . 194 4.1.6 Setting the security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 4.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance 201 4.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster. 202 4.1.9 Applying IBM Tivoli Workload Scheduler fix pack . . . . . . . . . . . . . . 204 4.1.10 Configure HACMP for IBM Tivoli Workload Scheduler . . . . . . . . . 210 4.1.11 Add IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . 303 4.1.12 Production considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 4.1.13 Just one IBM Tivoli Workload Scheduler instance . . . . . . . . . . . . 345 4.2 Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster . . . 347 4.2.1 Single instance of IBM Tivoli Workload Scheduler . . . . . . . . . . . . . 347 4.2.2 Configuring the cluster group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 4.2.3 Two instances of IBM Tivoli Workload Scheduler in a cluster. . . . . 383 4.2.4 Installation of the IBM Tivoli Management Framework . . . . . . . . . . 396 4.2.5 Installation of Job Scheduling Services. . . . . . . . . . . . . . . . . . . . . . 401 4.2.6 Installation of Job Scheduling Connector . . . . . . . . . . . . . . . . . . . . 402 4.2.7 Creating Connector instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 4.2.8 Interconnecting the two Tivoli Framework Servers . . . . . . . . . . . . . 405 4.2.9 Installing the Job Scheduling Console . . . . . . . . . . . . . . . . . . . . . . 408 4.2.10 Scheduled outage configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Chapter 5. Implement IBM Tivoli Management Framework in a cluster . 415 5.1 Implement IBM Tivoli Management Framework in an HACMP cluster . . 416iv High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 5.1.1 Inventory hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 5.1.2 Planning the high availability design . . . . . . . . . . . . . . . . . . . . . . . . 418 5.1.3 Create the shared disk volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 5.1.4 Install IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . 453 5.1.5 Tivoli Web interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 5.1.6 Tivoli Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 5.1.7 Tivoli Endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 5.1.8 Configure HACMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4805.2 Implementing Tivoli Framework in a Microsoft Cluster . . . . . . . . . . . . . . 503 5.2.1 TMR server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 5.2.2 Tivoli Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 5.2.3 Tivoli Endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555Appendix A. A real-life implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 571Rationale for IBM Tivoli Workload Scheduler and HACMP integration . . . . . 572Our environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572Installation roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573Software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574Hardware configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575Installing the AIX operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576Finishing the network configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577Creating the TTY device within AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577Testing the heartbeat interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578Configuring shared disk storage devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579Copying installation code to shared storage . . . . . . . . . . . . . . . . . . . . . . . . . 580Creating user accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581Creating group accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581Installing IBM Tivoli Workload Scheduler software . . . . . . . . . . . . . . . . . . . . 581Installing HACMP software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582Installing the Tivoli TMR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 Patching the Tivoli TMR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 TMR versus Managed Node installation . . . . . . . . . . . . . . . . . . . . . . . . . . 583Configuring IBM Tivoli Workload Scheduler start and stop scripts. . . . . . . . . 584Configuring miscellaneous start and stop scripts . . . . . . . . . . . . . . . . . . . . . . 584Creating and modifying various system files . . . . . . . . . . . . . . . . . . . . . . . . . 585Configuring the HACMP environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585Testing the failover procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 HACMP Cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 HACMP Cluster Resource Group topology . . . . . . . . . . . . . . . . . . . . . . . . 588 ifconfig -a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589Skills required to implement IBM Tivoli Workload Scheduling/HACMP . . . . . 590Observations and questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Contents v
    • Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS . . . . . 601 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Configure the wlocalhost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Install Framework on the primary node. . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Install Framework on the secondary node . . . . . . . . . . . . . . . . . . . . . . . . 603 Configure the TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Set the root administrators login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Force the oserv to bind to the virtual IP . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Change the name of the DBDIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Modify the setup_env.cmd and setup_env.sh . . . . . . . . . . . . . . . . . . . . . . 604 Configure the registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Rename the Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Rename the TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Rename the top-level policy region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Rename the root administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Configure the ALIDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Create the cluster resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Create the oserv cluster resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Create the trip cluster resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Set up the resource dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Validate and backup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Test failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Back up the Tivoli databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615vi High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • NoticesThis information was developed for products and services offered in the U.S.A.IBM may not offer the products, services, or features discussed in this document in other countries. Consultyour local IBM representative for information on the products and services currently available in your area.Any reference to an IBM product, program, or service is not intended to state or imply that only that IBMproduct, program, or service may be used. Any functionally equivalent product, program, or service thatdoes not infringe any IBM intellectual property right may be used instead. However, it is the usersresponsibility to evaluate and verify the operation of any non-IBM product, program, or service.IBM may have patents or pending patent applications covering subject matter described in this document.The furnishing of this document does not give you any license to these patents. You can send licenseinquiries, in writing, to:IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.The following paragraph does not apply to the United Kingdom or any other country where such provisionsare inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDESTHIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimerof express or implied warranties in certain transactions, therefore, this statement may not apply to you.This information could include technical inaccuracies or typographical errors. Changes are periodically madeto the information herein; these changes will be incorporated in new editions of the publication. IBM maymake improvements and/or changes in the product(s) and/or the program(s) described in this publication atany time without notice.Any references in this information to non-IBM Web sites are provided for convenience only and do not in anymanner serve as an endorsement of those Web sites. The materials at those Web sites are not part of thematerials for this IBM product and use of those Web sites is at your own risk.IBM may use or distribute any of the information you supply in any way it believes appropriate withoutincurring any obligation to you.Information concerning non-IBM products was obtained from the suppliers of those products, their publishedannouncements or other publicly available sources. IBM has not tested those products and cannot confirmthe accuracy of performance, compatibility or any other claims related to non-IBM products. Questions onthe capabilities of non-IBM products should be addressed to the suppliers of those products.This information contains examples of data and reports used in daily business operations. To illustrate themas completely as possible, the examples include the names of individuals, companies, brands, and products.All of these names are fictitious and any similarity to the names and addresses used by an actual businessenterprise is entirely coincidental.COPYRIGHT LICENSE:This information contains sample application programs in source language, which illustrates programmingtechniques on various operating platforms. You may copy, modify, and distribute these sample programs inany form without payment to IBM, for the purposes of developing, using, marketing or distributing applicationprograms conforming to the application programming interface for the operating platform for which thesample programs are written. These examples have not been thoroughly tested under all conditions. IBM,therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,modify, and distribute these sample programs in any form without payment to IBM for the purposes ofdeveloping, using, marketing, or distributing application programs conforming to IBMs applicationprogramming interfaces.© Copyright IBM Corp. 2004. All rights reserved. vii
    • TrademarksThe following terms are trademarks of the International Business Machines Corporation in the United States,other countries, or both: AFS® Maestro™ SAA® AIX® NetView® Tivoli Enterprise™ Balance® Planet Tivoli® Tivoli® DB2® PowerPC® TotalStorage® DFS™ pSeries® WebSphere® Enterprise Storage Server® Redbooks™ ^™ IBM® Redbooks (logo) ™ z/OS® LoadLeveler® RS/6000®The following terms are trademarks of other companies:Intel, Intel Inside (logos), and Pentium are trademarks of Intel Corporation in the United States, othercountries, or both.Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States,other countries, or both.Java and all Java-based trademarks and logos are trademarks or registered trademarks of SunMicrosystems, Inc. in the United States, other countries, or both.UNIX is a registered trademark of The Open Group in the United States and other countries.Other company, product, and service names may be trademarks or service marks of others.viii High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Preface This IBM® Redbook is intended to be used as a major reference for designing and creating highly available IBM Tivoli® Workload Scheduler and Tivoli Framework environments. IBM Tivoli Workload Scheduler Version 8.2 is the IBM strategic scheduling product that runs on many different platforms, including the mainframe. Here, we describe how to install ITWS Version 8.2 in a high availability (HA) environment and configure it to meet high availability requirements. The focus is on the IBM Tivoli Workload Scheduler Version 8.2 Distributed product, although some issues specific to Version 8.1 and IBM Tivoli Workload Scheduler for z/OS® are also briefly covered. When implementing a highly available IBM Tivoli Workload Scheduler environment, you have to consider high availability for both IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework environments, because IBM Tivoli Workload Scheduler uses IBM Tivoli Management Frameworks services for authentication. Therefore, we discuss techniques you can use to successfully implement IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework (TMR server, Managed Nodes and Endpoints), and we present two major case studies: High-Availability Cluster Multiprocessing (HACMP) for AIX®, and Microsoft® Windows® Cluster Service. The implementation of IBM Tivoli Workload Scheduler within a high availability environment will vary from platform to platform and from customer to customer, based on the needs of the installation. Here, we cover the most common scenarios and share practical implementation tips. We also make recommendations for other high availability platforms; although there are many different clustering technologies in the market today, they are similar enough to allow us to offer useful advice regarding the implementation of a highly available scheduling system. Finally, although we basically cover highly available scheduling systems, we also offer a section for customers who want to implement a highly available IBM Tivoli Management Framework environment, but who are not currently using IBM Tivoli Workload Scheduler.The team that wrote this redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center.© Copyright IBM Corp. 2004. All rights reserved. ix
    • Vasfi Gucer is an IBM Certified Consultant IT Specialist at the ITSO Austin Center. He has been with IBM Turkey for 10 years, and has worked at the ITSO since January 1999. He has more than 10 years of experience in systems management, networking hardware, and distributed platform software. He has worked on various Tivoli customer projects as a Systems Architect and Consultant in Turkey and in the United States, and is also a Certified Tivoli Consultant. Satoko Egawa is an I/T Specialist with IBM Japan. She has five years of experience in systems management solutions. Her area of expertise is job scheduling solutions using Tivoli Workload Scheduler. She is also a Tivoli Certified Consultant, and in the past has worked closely with the Tivoli Rome Lab. David Oswald is a Certified IBM Tivoli Services Specialist in New Jersey, United States, who works on IBM Tivoli Workload Scheduling and Tivoli storage architectures/deployments (TSRM, TSM,TSANM) for IBM customers located in the United States, Europe, and Latin America. He has been involved in disaster recovery, UNIX administration, shell scripting and automation for 17 years, and has worked with TWS Versions 5.x, 6.x, 7.x, and 8.x. While primarily a Tivoli services consultant, he is also involved in Tivoli course development, Tivoli certification exams, and Tivoli training efforts. Geoff Pusey is a Senior I/T Specialist in the IBM Tivoli Services EMEA region. He is a Certified IBM Tivoli Workload Scheduler Consultant and has been with Tivoli/IBM since January 1998, when Unison Software was acquired by Tivoli Systems. He has worked with the IBM Tivoli Workload Scheduling product for the last 10 years as a consultant, performing customer training, implementing and customizing IBM Tivoli Workload Scheduler, creating customized scripts to generate specific reports, and enhancing IBM Tivoli Workload Scheduler with new functions. John Webb is a Senior Consultant for Tivoli Services Latin America. He has been with IBM since 1998. Since joining IBM, John has made valuable contributions to the company through his knowledge and expertise in enterprise systems management. He has deployed and designed systems for numerous customers, and his areas of expertise include the Tivoli Framework and Tivoli PACO products. Anthony Yen is a Senior IT Consultant with IBM Business Partner Automatic IT Corporation, <http://www.AutomaticIT.com>, in Austin, Texas, United States. He has delivered 19 projects involving 11 different IBM Tivoli products over the past six years. His areas of expertise include Enterprise Console, Monitoring, Workload Scheduler, Configuration Manager, Remote Control, and NetView®. He has given talks at Planet Tivoli® and Automated Systems And Planning OPC and TWS Users Conference (ASAP), and has taught courses on IBM Tivolix High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Workload Scheduler. Before that, he worked in the IT industry for 10 years as a UNIX and Windows system administrator. He has been an IBM Certified Tivoli Consultant since 1998. Thanks to the following people for their contributions to this project: Octavian Lascu, Dino Quintero International Technical Support Organization, Poughkeepsie Center Jackie Biggs, Warren Gill, Elaine Krakower, Tina Lamacchia, Grant McLaughlin, Nick Lopez IBM USA Antonio Gallotti IBM ItalyBecome a published author Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. Youll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, youll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/Redbooks/residencies.htmlComments welcome Your comments are important to us! We want our Redbooks™ to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review Redbook form found at: ibm.com/Redbooks Send your comments in an Internet note to: Redbook@us.ibm.com Preface xi
    • Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493xii High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 1 Chapter 1. Introduction In this chapter, we introduce the IBM Tivoli Workload Scheduler suite and identify the need for high availability by IBM Tivoli Workload Scheduler users. Important ancillary concepts in IBM Tivoli Management Framework (also referred as Tivoli Framework, or TMF) and clustering technologies are introduced for new users as well. The following topics are covered in this chapter: “IBM Tivoli Workload Scheduler architectural overview” on page 2 “IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework” on page 4 “High availability terminology used in this book” on page 7 “Overview of clustering technologies” on page 8 “When to implement IBM Tivoli Workload Scheduler high availability” on page 24 “Material covered in this book” on page 27© Copyright IBM Corp. 2004. All rights reserved. 1
    • 1.1 IBM Tivoli Workload Scheduler architectural overview IBM Tivoli Workload Scheduler Version 8.2 is the IBM strategic scheduling product that runs on many different platforms, including the mainframe. This redbook covers installing ITWS Version 8.2 in a high availability (HA) environment and configuring it to meet high availability requirements. The focus is on the IBM Tivoli Workload Scheduler Version 8.2 Distributed product, although some issues specific to Version 8.1 and IBM Tivoli Workload Scheduler for z/OS are also briefly covered. Understanding specific aspects of IBM Tivoli Workload Scheduler’s architecture is key to a successful high availability implementation. In-depth knowledge of the architecture is necessary for resolving some problems that might present themselves during the deployment of IBM Tivoli Workload Scheduler in an HA environment. We will only identify those aspects of the architecture that are directly involved with an high availability deployment. For a detailed discussion of IBM Tivoli Workload Scheduler’s architecture, refer to Chapter 2, “Overview”, in IBM Tivoli Workload Scheduling Suite Version 8.2, General Information, SC32-1256. IBM Tivoli Workload Scheduler uses the TCP/IP-based network connecting an enterprise’s servers to accomplish its mission of scheduling jobs. A job is an executable file, program, or command that is scheduled and launched by IBM Tivoli Workload Scheduler. All servers that run jobs using IBM Tivoli Workload Scheduler make up the scheduling network. A scheduling network contains at least one domain, the master domain, in which a server designated as the Master Domain Manager (MDM) is the management hub. This server contains the definitions of all scheduling objects that define the batch schedule, stored in a database. Additional domains can be used to divide a widely distributed network into smaller, locally managed groups. The management hubs for these additional domains are called Domain Manager servers. Each server in the scheduling network is called a workstation, or by the interchangeable term CPU. There are different types of workstations that serve different roles. For the purposes of this publication, it is sufficient to understand that a workstation can be one of the following types. You have already been introduced to one of them, the Master Domain Manager. The other types of workstations are Domain Manager (DM) and Fault Tolerant Agent (FTA). Figure 1-1 on page 3 shows the relationship between these architectural elements in a sample scheduling network.2 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • MASTERDM AIX Master Domain Manager DomainA DomainB AIX HPUX Domain Domain Manager Manager DM_A DM_B FTA1 FTA2 FTA3 FTA4 AIX OS/400 Windows 2000 SolarisFigure 1-1 Main architectural elements of IBM Tivoli Workload Scheduler relevant to high availability The lines between the workstations show how IBM Tivoli Workload Scheduler communicates between them. For example, if the MDM needs to send a command to FTA2, it would pass the command via DM_A. In this example scheduling network, the Master Domain Manager is the management hub for two Domain Managers, DM_A and DM_B. Each Domain Manager in turn is the management hub for two Fault Tolerant Agents. DM_A is the hub for FTA1 and FTA2, and DM_B is the hub for FTA3 and FTA4. IBM Tivoli Workload Scheduler operations revolve around a production day, a 24-hour cycle initiated by a job called Jnextday that runs on the Master Domain Chapter 1. Introduction 3
    • Manager. Interrupting or delaying this process presents serious ramifications for the proper functioning of the scheduling network. Based upon this architecture, we determined that making IBM Tivoli Workload Scheduler highly available requires configuring at least the Master Domain Manager server for high availability. This delivers high availability of the scheduling object definitions. In some sites, even the Domain Manager and Fault Tolerant Agent servers are configured for high availability, depending upon specific business requirements.1.2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework IBM Tivoli Workload Scheduler provides out-of-the-box integration with up to six other IBM products: IBM Tivoli Management Framework IBM Tivoli Business Systems Manager IBM Tivoli Enterprise Console IBM Tivoli NetView IBM Tivoli Distributed Monitoring (Classic Edition) IBM Tivoli Enterprise Data Warehouse Other IBM Tivoli products, such as IBM Tivoli Configuration Manager, can also be integrated with IBM Tivoli Workload Scheduler but require further configuration not provided out of the box. Best practices call for implementing IBM Tivoli Management Framework on the same Master Domain Manager server used by IBM Tivoli Workload Scheduler. Figure 1-2 on page 5 shows a typical configuration of all six products, hosted on five servers (IBM Tivoli Business Systems Manager is often hosted on two separate servers).4 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • IBM Tivoli Management Framework IBM Tivoli Workload Scheduler IBM Tivoli Enterprise Console IBM Tivoli Management Framework IBM Tivoli Enterprise Data Warehouse IBM Tivoli Management Framework IBM Tivoli NetView IBM Tivoli Business Systems IBM Tivoli Distributed Monitoring ManagerFigure 1-2 Typical site configuration of all Tivoli products that can be integrated with IBM Tivoli WorkloadScheduler out of the box In this redbook, we show how to configure IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework for high availability, corresponding to the upper left server in the preceding example site configuration. Sites that want to implement other products on an IBM Tivoli Workload Scheduler Master Domain Manager server for high availability should consult their IBM service provider. IBM Tivoli Workload Scheduler uses IBM Tivoli Management Framework to deliver authentication services for the Job Scheduling Console GUI client, and to communicate with the Job Scheduling Console in general. Two components are used within IBM Tivoli Management Framework to accomplish these responsibilities: the Connector, and Job Scheduling Services (JSS). These components are only required on the Master Domain Manager server. For the purposes of this redbook, be aware that high availability of IBM Tivoli Workload Scheduler requires proper configuration of IBM Tivoli Management Framework, all Connector instances, and the Job Scheduling Services component. Figure 1-3 on page 6 shows the relationships between IBM Tivoli Management Framework, the Job Scheduling Services component, the IBM Tivoli Workload Scheduler job scheduling engine, and the Job Scheduling Console. Chapter 1. Introduction 5
    • Job Scheduling Consoles Connector_A Connector_B Tivoli Management Framework Job Scheduling Production_A Services Production_BFigure 1-3 Relationship between major components of IBM Tivoli Workload Scheduler and IBM TivoliManagement Framework In this example, Job Scheduling Console instances on three laptops are connected to a single instance of IBM Tivoli Management Framework. This instance of IBM Tivoli Management Framework serves two different scheduling networks called Production_A and Production_B via two Connectors called Connector_A and Connector_B. Note that there is only ever one instance of the6 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Job Scheduling Services component no matter how many instances of the Connector and Job Scheduling Console exist in the environment. It is possible to install IBM Tivoli Workload Scheduler without using the Connector and Job Scheduling Services components. However, without these components the benefits of the Job Scheduling Console cannot be realized. This is only an option if a customer is willing to perform all operations from just the command line interface. In high availability contexts, both IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework are typically deployed in a high availability environment. In this Redbook, we will show how to deploy IBM Tivoli Workload Scheduler both with and without IBM Tivoli Management Framework.1.3 High availability terminology used in this book It helps to share a common terminology for concepts used in this redbook. The high availability field often uses multiple terms for the same concept, but in this redbook, we adhere to conventions set by International Business Machines Corporation whenever possible. Cluster This refers to a group of servers configured for high availability of one or more applications. Node This refers to a single server in a cluster. Primary This refers to a node that initially runs an application when a cluster is started. Backup This refers to one or more nodes that are designated as the servers an application will be migrated to if the application’s primary node fails. Joining This refers to the process of a node announcing its availability to the cluster. Fallover This refers to the process of a backup node taking over an application from a failed primary node. Reintegration This refers to the process of a failed primary node that was repaired rejoining a cluster. Note that the primary node’s application does not necessarily have to migrate back to the primary node. See fallback. Fallback This refers to the process of migrating an application from a backup node to a primary node. Note that the primary node does not have to be the original primary node (for example, it can be a new node that joins the cluster). Chapter 1. Introduction 7
    • For more terms commonly used when configuring high availability, refer to High Availability Cluster Multi-Processing for AIX Master Glossary, Version 5.1, SC23-4867.1.4 Overview of clustering technologies In this section we give an overview of clustering technologies with respect to high availability. A cluster is a group of loosely coupled machines networked together, sharing disk resources. While clusters can be used for more than just their high availability benefits (like cluster multi-processing), in this document we are only concerned with illustrating the high availability benefits; consult your IBM service provider for information about how to take advantage of the other benefits of clusters for IBM Tivoli Workload Scheduler. Clusters provide a highly available environment for mission-critical applications. For example, a cluster could run a database server program which services client applications on other systems. Clients send queries to the server program, which responds to their requests by accessing a database stored on a shared external disk. A cluster takes measures to ensure that the applications remain available to client processes even if a component in a cluster fails. To ensure availability, in case of a component failure, a cluster moves the application (along with resources that ensure access to the application) to another node in the cluster.1.4.1 High availability versus fault tolerance It is important for you to understand that we are detailing how to install IBM Tivoli Workload Scheduler in a highly available, but not a fault-tolerant, configuration. Fault tolerance relies on specialized hardware to detect a hardware fault and instantaneously switch to a redundant hardware component (whether the failed component is a processor, memory board, power supply, I/O subsystem, or storage subsystem). Although this cut-over is apparently seamless and offers non-stop service, a high premium is paid in both hardware cost and performance because the redundant components do no processing. More importantly, the fault-tolerant model does not address software failures, by far the most common reason for downtime. High availability views availability not as a series of replicated physical components, but rather as a set of system-wide, shared resources that cooperate to guarantee essential services. High availability combines software with industry-standard hardware to minimize downtime by quickly restoring essential services when a system, component, or application fails. While not instantaneous, services are restored rapidly, often in less than a minute.8 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • The difference between fault tolerance and high availability, then, is this: a fault-tolerant environment has no service interruption, while a highly available environment has a minimal service interruption. Many sites are willing to absorb a small amount of downtime with high availability rather than pay the much higher cost of providing fault tolerance. Additionally, in most highly available configurations, the backup processors are available for use during normal operation. High availability systems are an excellent solution for applications that can withstand a short interruption should a failure occur, but which must be restored quickly. Some industries have applications so time-critical that they cannot withstand even a few seconds of downtime. Many other industries, however, can withstand small periods of time when their database is unavailable. For those industries, HACMP can provide the necessary continuity of service without total redundancy. Figure 1-4 shows the costs and benefits of availability technologies.Figure 1-4 Cost and benefits of availability technologies As you can see, availability is not an all-or-nothing proposition. Think of availability as a continuum. Reliable hardware and software provide the base level of availability. Advanced features such as RAID devices provide an enhanced level of availability. High availability software provides near-continuous Chapter 1. Introduction 9
    • access to data and applications. Fault-tolerant systems ensure the constant availability of the entire system, but at a higher cost.1.4.2 Server versus job availability You should also be aware of the difference between availability of the server and availability of the jobs the server runs. This redbook shows how to implement a highly available server. Ensuring the availability of the jobs is addressed on a job-by-job basis. For example, Figure 1-5 shows a production day with four job streams, labeled A, B, C and D. In this example, a failure incident occurs in between job stream B and D, during a period of the production day when no other job streams are running. Job Stream A Job Stream B Job Stream D Job Stream C Production Day Failure IncidentFigure 1-5 Example disaster recovery incident where no job recovery is required Because no jobs or job streams are running at the moment of the failure, making IBM Tivoli Workload Scheduler itself highly available is sufficient to bring back scheduling services. No recovery of interrupted jobs is required. Now suppose that job streams B and D must complete before a database change is committed. If the failure happened during job stream D as in Figure 1-6 on page 11, then before IBM Tivoli Workload Scheduler is restarted on a new server, the database needs to be rolled back so that when job stream B is restarted, it will not corrupt the database.10 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Job Stream A Job Stream B Job Stream D Job Stream C Production Day Failure IncidentFigure 1-6 Example disaster recovery incident where job recovery not related to IBM Tivoli WorkloadScheduler is required This points out some important observations about high availability with IBM Tivoli Workload Scheduler. It is your responsibility to ensure that the application-specific business logic of your application is preserved across a disaster incident. For example, IBM Tivoli Workload Scheduler cannot know that a database needs to be rolled back before a job stream is restarted as part of a high availability recovery. Knowing what job streams and jobs to restart after IBM Tivoli Workload Scheduler falls over to a backup server is dependent upon the specific business logic of your production plan. In fact, it is critical to the success of a recovery effort that the precise state of the production day at the moment of failure is communicated to the team performing the recovery. Let’s look at Figure 1-7 on page 12, which illustrates an even more complex situation: multiple job streams are interrupted, each requiring its own, separate recovery activity. Chapter 1. Introduction 11
    • Job Stream A Job Stream B Job Stream D Job Stream C Production Day Failure IncidentFigure 1-7 Example disaster recovery incident requiring multiple, different job recovery actions The recovery actions for job stream A in this example are different from the recovery actions for job stream B. In fact, depending upon the specifics of what your jobs and job streams run, the recovery action for a job stream that are required after a disaster incident could be different depending upon what jobs in a job stream finished before the failure. The scenario this redbook is most directly applicable towards is restarting an IBM Tivoli Workload Scheduler Master Domain Manager server on a highly available cluster where no job streams other than FINAL are executed. The contents of this redbook can also be applied to Master Domain Manager, Domain Manager, and Fault Tolerant Agent servers that run job streams requiring specific recovery actions as part of a high availability recovery. But implementing these scenarios requires simultaneous implementation of high availability for the individual jobs. The exact details of such implementations are specific to your jobs, and cannot be generalized in a “cookbook” manner. If high availability at the job level is an important criteria, your IBM service provider can help you to implement it.1.4.3 Standby versus takeover configurations There are two basic types of cluster configurations: Standby This is the traditional redundant hardware configuration. One or more standby nodes are set aside idling, waiting for a primary server in the cluster to fail. This is also known as hot standby.12 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Takeover In this configuration, all cluster nodes process part of the cluster’s workload. No nodes are set aside as standby nodes. When a primary node fails, one of the other nodes assumes the workload of the failed node in addition to its existing primary workload. This is also known as mutual takeover.Typically, implementations of both configurations will involve shared resources.Disks or mass storage like a Storage Area Network (SAN) are most frequentlyconfigured as a shared resource.Figure 1-8 shows a standby configuration in normal operation, where Node A isthe primary node, and Node B is the standby node and currently idling. WhileNode B has a connection the shared mass storage resource, it is not activeduring normal operation. Node A Node B Standby (idle) Mass StorageFigure 1-8 Standby configuration in normal operationAfter Node A falls over to Node B, the connection to the mass storage resourcefrom Node B will be activated, and because Node A is unavailable, its connectionto the mass storage resource is inactive. This is shown in Figure 1-9 on page 14. Chapter 1. Introduction 13
    • Node A (down) Node B Standby X (active) Mass Storage Figure 1-9 Standby configuration in fallover operation By contrast, a takeover configuration of this environment accesses the shared disk resource at the same time. For IBM Tivoli Workload Scheduler high availability configurations, this usually means that the shared disk resource has separate, logical filesystem volumes, each accessed by a different node. This is illustrated by Figure 1-10 on page 15.14 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Node A Node B App 1 App 2 Node A FS Node B FS Mass StorageFigure 1-10 Takeover configuration in normal operationDuring normal operation of this two-node highly available cluster in a takeoverconfiguration, the filesystem Node A FS is accessed by App 1 on Node A, whilethe filesystem Node B FS is accessed by App 2 on Node B. If either node fails,the other node will take on the workload of the failed node. For example, if NodeA fails, App 1 is restarted on Node B, and Node B opens a connection tofilesystem Node A FS. This fallover scenario is illustrated by Figure 1-11 onpage 16. Chapter 1. Introduction 15
    • Node A Node B X App 2 App 1 Node A FS Node B FS Mass Storage Figure 1-11 Takeover configuration in fallover operation Takeover configurations are more efficient with hardware resources than standby configurations because there are no idle nodes. Performance can degrade after a node failure, however, because the overall load on the remaining nodes increases. In this redbook we will be showing how to configure IBM Tivoli Workload Scheduler for takeover high availability.1.4.4 IBM HACMP The IBM tool for building UNIX-based, mission-critical computing platforms is the HACMP software. The HACMP software ensures that critical resources, such as applications, are available for processing. HACMP has two major components: high availability (HA) and cluster multi-processing (CMP). In this document we focus upon the HA component. The primary reason to create HACMP Clusters is to provide a highly available environment for mission-critical applications. For example, an HACMP Cluster could run a database server program that services client applications. The clients send queries to the server program, which responds to their requests by accessing a database stored on a shared external disk.16 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • In an HACMP Cluster, to ensure the availability of these applications, theapplications are put under HACMP control. HACMP takes measures to ensurethat the applications remain available to client processes even if a component ina cluster fails. To ensure availability, in case of a component failure, HACMPmoves the application (along with resources that ensure access to theapplication) to another node in the cluster.BenefitsHACMP helps you with each of the following: The HACMP planning process and documentation include tips and advice on the best practices for installing and maintaining a highly available HACMP Cluster. Once the cluster is operational, HACMP provides the automated monitoring and recovery for all the resources on which the application depends. HACMP provides a full set of tools for maintaining the cluster, while keeping the application available to clients.HACMP lets you: Set up an HACMP environment using online planning worksheets to simplify initial planning and setup. Ensure high availability of applications by eliminating single points of failure in an HACMP environment. Leverage high availability features available in AIX. Manage how a cluster handles component failures. Secure cluster communications. Set up fast disk takeover for volume groups managed by the Logical Volume Manager (LVM). Manage event processing for an HACMP environment. Monitor HACMP components and diagnose problems that may occur.For a general overview of all HACMP features, see the IBM Web site:http://www-1.ibm.com/servers/aix/products/ibmsw/high_avail_network/hacmp.htmlEnhancing availability with the AIX softwareHACMP takes advantage of the features in AIX, which is the high-performanceUNIX operating system.AIX Version 5.1 adds new functionality to further improve security and systemavailability. This includes improved availability of mirrored data and Chapter 1. Introduction 17
    • enhancements to Workload Manager that help solve problems of mixed workloads by dynamically providing resource availability to critical applications. Used with the IBM IBM ^™ pSeries®, HACMP can provide both horizontal and vertical scalability, without downtime. The AIX operating system provides numerous features designed to increase system availability by lessening the impact of both planned (data backup, system administration) and unplanned (hardware or software failure) downtime. These features include: Journaled File System and Enhanced Journaled File System Disk mirroring Process control Error notification The IBM HACMP software provides a low-cost commercial computing environment that ensures that mission-critical applications can recover quickly from hardware and software failures. The HACMP software is a high availability system that ensures that critical resources are available for processing. High availability combines custom software with industry-standard hardware to minimize downtime by quickly restoring services when a system, component, or application fails. While not instantaneous, the restoration of service is rapid, usually 30 to 300 seconds. Physical components of an HACMP Cluster HACMP provides a highly available environment by identifying a set of resources essential to uninterrupted processing, and by defining a protocol that nodes use to collaborate to ensure that these resources are available. HACMP extends the clustering model by defining relationships among cooperating processors where one processor provides the service offered by a peer, should the peer be unable to do so. An HACMP Cluster is made up of the following physical components: Nodes Shared external disk devices Networks Network interfaces Clients The HACMP software allows you to combine physical components into a wide range of cluster configurations, providing you with flexibility in building a cluster that meets your processing requirements. Figure 1-12 on page 19 shows one18 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • example of an HACMP Cluster. Other HACMP Clusters could look very different, depending on the number of processors, the choice of networking and disk technologies, and so on.Figure 1-12 Example HACMP Cluster Nodes Nodes form the core of an HACMP Cluster. A node is a processor that runs both AIX and the HACMP software. The HACMP software supports pSeries uniprocessor and symmetric multiprocessor (SMP) systems, and the Scalable POWERParallel processor (SP) systems as cluster nodes. To the HACMP software, an SMP system looks just like a uniprocessor. SMP systems provide a cost-effective way to increase cluster throughput. Each node in the cluster can be a large SMP machine, extending an HACMP Cluster far beyond the limits of a single system and allowing thousands of clients to connect to a single database. Chapter 1. Introduction 19
    • In an HACMP Cluster, up to 32 RS/6000® or pSeries stand-alone systems, pSeries divided into LPARS, SP nodes, or a combination of these cooperate to provide a set of services or resources to other entities. Clustering these servers to back up critical applications is a cost-effective high availability option. A business can use more of its computing power, while ensuring that its critical applications resume running after a short interruption caused by a hardware or software failure. In an HACMP Cluster, each node is identified by a unique name. A node may own a set of resources (disks, volume groups, filesystems, networks, network addresses, and applications). Typically, a node runs a server or a “back-end” application that accesses data on the shared external disks. The HACMP software supports from 2 to 32 nodes in a cluster, depending on the disk technology used for the shared external disks. A node in an HACMP Cluster has several layers of software components. Shared external disk devices Each node must have access to one or more shared external disk devices. A shared external disk device is a disk physically connected to multiple nodes. The shared disk stores mission-critical data, typically mirrored or RAID-configured for data redundancy. A node in an HACMP Cluster must also have internal disks that store the operating system and application binaries, but these disks are not shared. Depending on the type of disk used, the HACMP software supports two types of access to shared external disk devices: non-concurrent access, and concurrent access. In non-concurrent access environments, only one connection is active at any given time, and the node with the active connection owns the disk. When a node fails, disk takeover occurs when the node that currently owns the disk leaves the cluster and a surviving node assumes ownership of the shared disk. This is what we show in this redbook. In concurrent access environments, the shared disks are actively connected to more than one node simultaneously. Therefore, when a node fails, disk takeover is not required. We do not show this here because concurrent access does not support the use of the Journaled File System (JFS), and JFS is required to use either IBM Tivoli Workload Scheduler or IBM Tivoli Management Framework. Networks As an independent, layered component of AIX, the HACMP software is designed to work with any TCP/IP-based network. Nodes in an HACMP Cluster use the network to allow clients to access the cluster nodes, enable cluster nodes to20 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • exchange heartbeat messages and, in concurrent access environments, serialize access to data. The HACMP software has been tested with Ethernet, Token-Ring, ATM, and other networks. The HACMP software defines two types of communication networks, characterized by whether these networks use communication interfaces based on the TCP/IP subsystem (TCP/IP-based), or communication devices based on non-TCP/IP subsystems (device-based). Clients A client is a processor that can access the nodes in a cluster over a local area network. Clients each run a front-end or client application that queries the server application running on the cluster node. The HACMP software provides a highly available environment for critical data and applications on cluster nodes. Note that the HACMP software does not make the clients themselves highly available. AIX clients can use the Client Information (Clinfo) services to receive notice of cluster events. Clinfo provides an API that displays cluster status information. The /usr/es/sbin/cluster/clstat utility, a Clinfo client shipped with the HACMP software, provides information about all cluster service interfaces. The clients for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework are the Job Scheduling Console and the Tivoli Desktop applications, respectively. These clients do not support the Clinfo API, but feedback that the cluster server is not available is immediately provided within these clients.1.4.5 Microsoft Cluster Service Microsoft Cluster Service (MSCS) provides three primary services: Availability Continue providing a service even during hardware or software failure. This redbook focuses upon leveraging this feature of MSCS. Scalability Enable additional components to be configured as system load increases. Simplification Manage groups of systems and their applications as a single system. MSCS is a built-in feature of Windows NT/2000 Server Enterprise Edition. It is software that supports the connection of two servers into a cluster for higher availability and easier manageability of data and applications. MSCS can automatically detect and recover from server or application failures. It can be used to move server workload to balance utilization and to provide for planned maintenance without downtime. Chapter 1. Introduction 21
    • MSCS uses software heartbeats to detect failed applications or servers. In the event of a server failure, it employs a shared nothing clustering architecture that automatically transfers ownership of resources (such as disk drives and IP addresses) from a failed server to a surviving server. It then restarts the failed server’s workload on the surviving server. All of this, from detection to restart, typically takes under a minute. If an individual application fails (but the server does not), MSCS will try to restart the application on the same server. If that fails, it moves the application’s resources and restarts it on the other server. MSCS does not require any special software on client computers; so, the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection is often transparent because MSCS restarts the application using the same IP address. If a client is using stateless connections (such as a browser connection), then it would be unaware of a failover if it occurred between server requests. If a failure occurs when a client is connected to the failed resources, then the client will receive whatever standard notification is provided by the client side of the application in use. For a client side application that has statefull connections to the server, a new logon is typically required following a server failure. No manual intervention is required when a server comes back online following a failure. As an example, when a server that is running Microsoft Cluster Server (server A) boots, it starts the MSCS service automatically. MSCS in turn checks the interconnects to find the other server in its cluster (server B). If server A finds server B, then server A rejoins the cluster and server B updates it with current cluster information. Server A can then initiate a failback, moving back failed-over workload from server B to server A. Microsoft Cluster Service concepts Microsoft provides an overview of MSCS in a white paper that is available at: http://www.microsoft.com/ntserver/ProductInfo/Enterprise/clustering/ClustArchit.asp The key concepts of MSCS are covered in this section. Shared nothing Microsoft Cluster employs a shared nothing architecture in which each server owns its own disk resources (that is, they share nothing at any point in time). In the event of a server failure, a shared nothing cluster has software that can transfer ownership of a disk from one server to another.22 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Cluster ServicesCluster Services is the collection of software on each node that manages allcluster-specific activity.ResourceA resource is the canonical item managed by the Cluster Service. A resourcemay include physical hardware devices (such as disk drives and network cards),or logical items (such as logical disk volumes, TCP/IP addresses, entireapplications, and databases).GroupA group is a collection of resources to be managed as a single unit. A groupcontains all of the elements needed to run a specific application and for clientsystems to connect to the service provided by the application. Groups allow anadministrator to combine resources into larger logical units and manage them asa unit. Operations performed on a group affect all resources within that group.FallbackFallback (also referred as failback) is the ability to automatically rebalance theworkload in a cluster when a failed server comes back online. This is a standardfeature of MSCS. For example, say server A has crashed, and its workload failedover to server B. When server A reboots, it finds server B and rejoins the cluster.It then checks to see if any of the Cluster Group running on server B would preferto be running in server A. If so, it automatically moves those groups from serverB to server A. Fallback properties include information such as which group canfallback, which server is preferred, and during what hours the time is right for afallback. These properties can all be set from the cluster administration console.Quorum DiskA Quorum Disk is a disk spindle that MSCS uses to determine whether anotherserver is up or down.When a cluster member is booted, it searches whether the cluster software isalready running in the network: If it is running, the cluster member joins the cluster. If it is not running, the booting member establishes the cluster in the network.A problem may occur if two cluster members are restarting at the same time,thus trying to form their own clusters. This potential problem is solved by theQuorum Disk concept. This is a resource that can be owned by one server at atime and for which servers negotiate for ownership. The member who has theQuorum Disk creates the cluster. If the member that has the Quorum Disk fails,the resource is reallocated to another member, which in turn, creates the cluster. Chapter 1. Introduction 23
    • Negotiating for the quorum drive allows MSCS to avoid split-brain situations where both servers are active and think the other server is down. Load balancing Load balancing is the ability to move work from a very busy server to a less busy server. Virtual server A virtual server is the logical equivalent of a file or application server. There is no physical component in the MSCS that is a virtual server. A resource is associated with a virtual server. At any point in time, different virtual servers can be owned by different cluster members. The virtual server entity can also be moved from one cluster member to another in the event of a system failure.1.5 When to implement IBM Tivoli Workload Scheduler high availability Specifying the appropriate level of high availability for IBM Tivoli Workload Scheduler often depends upon how much reliability needs to be built into the environment, balanced against the cost of solution. High availability is a spectrum of options, driven by what kind of failures you want IBM Tivoli Workload Scheduler to survive. These options lead to innumerable permutations of high availability configurations and scenarios. Our goal in this redbook is to demonstrate enough of the principles in configuring IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework to be highly available in a specific, non-trivial scenario such that you can use the principles to implement other configurations.1.5.1 High availability solutions versus Backup Domain Manager IBM Tivoli Workload Scheduler provides a degree of high availability through its Backup Domain Manager feature, which can also be implemented as a Backup Master Domain Manager. This works by duplicating the changes to the production plan from a Domain Manager to a Backup Domain Manager. When a failure is detected, a switchmgr command is issued to all workstations in the Domain Manager server’s domain, causing these workstations to recognize the Backup Domain Manager. However, properly implementing a Backup Domain Manager is difficult. Custom scripts have to be developed to implement sensing a failure, transferring the scheduling objects database, and starting the switchmgr command. The code for sensing a failure is by itself a significant effort. Possible failures to code for24 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • include network adapter failure, disk I/O adapter failure, network communicationsfailure, and so on.If any jobs are run on the Domain Manager, the difficulty of implementing aBackup Domain Manager becomes even more obvious. In this case, the customscripts also have to convert the jobs to run on the Backup Domain Manager, forinstance by changing all references to the workstation name of the DomainManager to the workstation name of the Backup Domain Manager, and changingreferences to the hostname of the Domain Manager to the hostname of theBackup Domain Manager.Then even more custom scripts have to be developed to migrate schedulingobject definitions back to the Domain Manager, because once the failure hasbeen addressed, the entire process has to be reversed. The effort required canbe more than the cost of acquiring a high availability product, which addressesmany of the coding issues that surround detecting hardware failures. The TotalCost of Ownership of maintaining the custom scripts also has to be taken intoaccount, especially if jobs are run on the Domain Manager. All the nuances ofensuring that the same resources that jobs expect on the Domain Manager aremet on the Backup Domain Manager have to be coded into the scripts, thendocumented and maintained over time, presenting a constant drain on internalprogramming resources.High availability products like IBM HACMP and Microsoft Cluster Service providea well-documented, widely-supported means of expressing the requiredresources for jobs that run on a Domain Manager. This makes it easy to addcomputational resources (for example, disk volumes) that jobs require into thehigh availability infrastructure, and keep it easily identified and documented.Software failures like a critical IBM Tivoli Workload Scheduler process crashingare addressed by both the Backup Domain Manager feature and IBM TivoliWorkload Scheduler configured for high availability. In both configurations,recovery at the job level is often necessary to resume the production day.Implementing high availability for Fault Tolerant Agents cannot be accomplishedusing the Backup Domain Manager feature. Providing hardware high availabilityfor a Fault Tolerant Agent server can be accomplished through custom scripting,but using a high availability solution is strongly recommended.Table 1-1 on page 26 illustrates the comparative advantages of using a highavailability solution versus the Backup Domain Manager feature to deliver ahighly available IBM Tivoli Workload Scheduler configuration. Chapter 1. Introduction 25
    • Table 1-1 Comparative advantages of using a high availability solution Solution Hardware Software FTA Cost HA P P P TCO: $$ BMDM P Initially: $ TCO: $$1.5.2 Hardware failures to plan for When identifying the level of high availability for IBM Tivoli Workload Scheduler, potential hardware failures you want to plan for can affect the kind of hardware used for the high availability solution. In this section, we address some of the hardware failures you may want to consider when planning for high availability for IBM Tivoli Workload Scheduler. Site failure occurs when an entire computer room or data center becomes unavailable. Mitigating this failure involves geographically separate nodes in a high availability cluster. Products like IBM High Availability Geographic Cluster system (HAGEO) deliver a solution for geographic high availability. Consult your IBM service provider for help with implementing geographic high availability. Server failure occurs when a node in a high availability cluster fails. The minimum response to mitigate this failure mode is to make backup node available. However, you might also want to consider providing more than one backup node if the workstation you are making highly available is important enough to warrant redundant backup nodes. In this redbook we show how to implement a two-node cluster, but additional nodes are an extension to a two-node configuration. Consult your IBM service provider for help with implementing multiple-node configurations. Network failures occur when either the network itself (through a component like a router or switch), or network adapters on the server, fail. This type of failure is often addressed with redundant network paths in the former case, and redundant network adapters in the latter case. Disk failure occurs when a shared disk in a high availability cluster fails. Mitigating this failure mode often involves a Redundant Array of Independent Disks (RAID) array. However, even a RAID can catastrophically fail if two or more disk drives fail at the same time, if a power supply fails, or a backup power supply fails at the same time as a primary power supply. Planning for these catastrophic failures usually involves creating one or more mirrors of the RAID array, sometimes even on separate array hardware. Products like the IBM TotalStorage® Enterprise Storage Server® (ESS) and TotalStorage 7133 Serial Disk System can address these kinds of advanced disk availability requirements.26 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • These are only the most common hardware failures to plan for. Other failures may also be considered while planning for high availability.1.5.3 Summary In summary, for all but the simplest configuration of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework, using a high availability solution to deliver high availability services is the recommended approach to satisfy high availability requirements. Identifying the kinds of hardware and software failures you want your IBM Tivoli Workload Scheduler installation to address with high availability is a key part of creating an appropriate high availability solution.1.6 Material covered in this book In the remainder of this redbook, we focus upon the applicable high availability concepts for IBM Tivoli Workload Scheduler, and two detailed implementations of high availability for IBM Tivoli Workload Scheduler, one using IBM HACMP and the other using Microsoft Cluster Service. In particular, we show you: Key architectural design issues and concepts to consider when designing highly available clusters for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework; refer to Chapter 2, “High level design and architecture” on page 31. How to implement an AIX HACMP and Microsoft Cluster Service cluster; refer to Chapter 3, “High availability cluster implementation” on page 63. How to implement a highly available installation of IBM Tivoli Workload Scheduler, and a highly available IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework, on AIX HACMP and Microsoft Cluster Service; refer to Chapter 4, “IBM Tivoli Workload Scheduler implementation in a cluster” on page 183. How to implement a highly available installation of IBM Tivoli Management Framework on AIX HACMP and Microsoft Cluster Service; refer to Chapter 5, “Implement IBM Tivoli Management Framework in a cluster” on page 415. The chapters are generally organized around the products we cover in this redbook: AIX HACMP, Microsoft Cluster Service, IBM Tivoli Workload Scheduler, and IBM Tivoli Management Framework. The nature of high availability design and implementation requires that some products and the high availability tool be considered simultaneously, especially during the planning Chapter 1. Introduction 27
    • stage. This tends to lead to a haphazard sequence when applied along any thematic organization, except a straight cookbook recipe approach. We believe the best results are obtained when we present enough of the theory and practice of implementing highly available IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework installations so that you can apply the illustrated principles to your own requirements. This rules out a cookbook recipe approach in the presentation, but readers who want a “recipe” will still find value in this redbook. If you are particularly interested in following a specific configuration we show in this redbook from beginning to end, the following chapter road map gives the order that you should read the material. If you are not familiar with high availability in general, and AIX HACMP or Microsoft Cluster Service in particular, we strongly recommend that you use the introductory road map shown in Figure 1-13. Chapter 1 Chapter 2 Figure 1-13 Introductory high availability road map If you want an installation of IBM Tivoli Workload Scheduler in a highly available configuration by itself, without IBM Tivoli Management Framework, the road map shown in Figure 1-14 on page 29 gives the sequence of chapters to read. This would be appropriate, for example, for implementing a highly available Fault Tolerant Agent.28 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Chapter 3 Chapter 4 (except for Framework sections)Figure 1-14 Road map for implementing highly available IBM Tivoli Workload Scheduler(no IBM Tivoli Management Framework, no Job Scheduling Console access throughcluster nodes)If you want to implement an installation of IBM Tivoli Workload Scheduler withIBM Tivoli Management Framework, use the road map shown in Figure 1-15. Chapter 3 Chapter 4Figure 1-15 Road map for implementing IBM Tivoli Workload Scheduler in a highlyavailable configuration, with IBM Tivoli Management FrameworkIf you want to implement an installation of IBM Tivoli Management Framework ina highly available configuration by itself, without IBM Tivoli Workload Scheduler,the road map shown in Figure 1-16 on page 30 should be used. This would beappropriate, for example, for implementing a stand-alone IBM Tivoli ManagementFramework server as a prelude to installing and configuring other IBM Tivoliproducts. Chapter 1. Introduction 29
    • Chapter 3 Chapter 5 Figure 1-16 Road map for implementing IBM Tivoli Management Framework by itself High availability design is a very broad subject. In this redbook, we provide representative scenarios meant to demonstrate to you the issues that must be considered during implementation. Many ancillary issues are briefly mentioned but not explored in depth here. For further information, we encourage you to read the material presented in “Related publications” on page 611.30 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 2 Chapter 2. High level design and architecture Implementing a high availability cluster is an essential task for most mission-critical systems. In this chapter, we present a high level overview of HA clusters. We cover the following topics: “Concepts of high availability clusters” on page 32 “Hardware configurations” on page 43 “Software configurations” on page 46© Copyright IBM Corp. 2004. All rights reserved. 31
    • 2.1 Concepts of high availability clusters Today, as more and more business and non-business organizations rely on their computer systems to carry out their operations, ensuring high availability (HA) to their computer systems has become a key issue. A failure of a single system component could result in an extended denial of service. To avoid or minimize the risk of denial of service, many sites consider an HA cluster to be a high availability solution. In this section we describe what an HA cluster is normally comprised of, then discuss software/hardware considerations and introduce possible ways of configuring an HA cluster.2.1.1 A bird’s-eye view of high availability clusters We start with defining the components of a high availability cluster. Basic elements of a high availability cluster A typical HA cluster, as introduced in Chapter 1, “Introduction” on page 1, is a group of machines networked together sharing external disk resources. The ultimate purpose of setting up an HA cluster is to eliminate any possible single points of failure. By eliminating single points of failure, the system can continue to run, or recover in an acceptable period of time, with minimal impact to the end users. Two major elements make a cluster highly available: A set of redundant system components Cluster software that monitors and controls these components in case of a failure Redundant system components provide backup in case of a single component failure. In an HA cluster, an additional server(s) is added to provide server-level backups in case of a server failure. Components in a server, such as network adapters, disk adapters, disks and power supplies, are also duplicated to eliminate single points of failure. However, simply duplicating system components does not provide high availability, and cluster software is usually employed to control them. Cluster software is the core element in HA clusters. It is what ties system components into clusters and takes control of those clusters. Typical cluster software provides a facility to configure clusters and predefine actions to be taken in case of a component failure. The basic function of cluster software in general is to detect component failure and control the redundant components to restore service after a failure. In the event of a component failure, cluster software quickly transfers whatever service32 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • the failed component provided to a backup component, thus ensuring minimumdowntime. There are several cluster software products in the market today;Table 2-1 lists common cluster software for each platform.Table 2-1 Commonly used cluster software - by platform Platform type Cluster software AIX HACMP HP-UX MC/Service Guard Solaris Sun Cluster, Veritas Cluster Service Linux SCYLD Beowulf, Open Source Cluster Application Resources (OSCAR), IBM Tivoli System Automation Microsoft Windows Microsoft Cluster ServiceEach cluster software product has its own unique benefits, and the terminologiesand technologies may differ from product to product. However, the basic conceptand functions of most cluster software provides have much in common. In thefollowing sections we describe how an HA cluster is typically configured and howit works, using simplified examples.Typical high availability cluster configurationMost cluster software offers various options to configure an HA cluster.Configurations depend on the system’s high availability requirements and thecluster software used. Though there are several variations, the twoconfigurations types most often discussed are idle or hot standby, and mutualtakeover.Basically, a hot standby configuration assumes a second physical node capableof taking over for the first node. The second node sits idle except in the case of afallover. Meanwhile, the mutual takeover configuration consists of two nodes,each with their own set of applications, that can take on the function of the otherin case of a node failure. In this configuration, each node should have sufficientmachine power to run jobs of both nodes in the event of a node failure.Otherwise, the applications of both nodes will run in a degraded mode after afallover, since one node is doing the job previously done by two. Mutual takeoveris usually considered to be a more cost effective choice since it avoids having asystem installed just for hot standby.Figure 2-1 on page 34 shows a typical mutual takeover configuration. Using thisfigure as an example, we will describe what comprises an HA cluster. Keep inmind that this is just an example of an HA cluster configuration. Mutual takeoveris a popular configuration; however, it may or may not be the best high Chapter 2. High level design and architecture 33
    • availability solution for you. For a configuration that best matches your requirements, consult your service provider. Cluster_A subnet1 subnet2 net_hb App_A App_B Disk_A Disk_B Disk_A Disk_B mirror mirror Node_A Node_BFigure 2-1 A typical HA cluster configuration As you can see in Figure 2-1, Cluster_A has Node_A and Node_B. Each node is running an application. The two nodes are set up so that each node is able to provide the function of both nodes in case a node or a system component on a node fails. In normal production, Node_A runs App_A and owns Disk_A, while Node_B runs App_B and owns Disk_B. When one of the nodes fail, the other node will acquire ownership of both disks and run both applications. Redundant hardware components are the bottom-line requirement to enable a high availability scenario. In the scenario shown here, notice that most hardware components are duplicated. The two nodes are each connected to two physical TCP/IP networks, subnet1 and subnet2, providing an alternate network connection in case of a network component failure. They share a same set of external disks, Disk_A and Disk_B, each mirrored to prevent the loss of data in case of a disk failure. Both nodes have a path to connect to the external disks. This enables one node to acquire owner ship of an external disk owned by34 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • another node in case of a node failure. For example, if Node_A fails, Node_B canacquire ownership of Disk_A and resume whatever service that requires Disk_A.Disk adapters connecting the nodes and the external disks are duplicated toprovide backup in the event of a disk adapter failure.In some cluster configurations, there may be an additional non-TCP/IP networkthat directly connects the two nodes, used for heartbeats. This is shown in thefigure as net_hb. To detect failures such as network and node failure, mostcluster software uses the heartbeat mechanism.Each node in the cluster sends ‘‘heartbeat’’ packets to its peer nodes overTCP/IP network and/or non-TCP/IP network. If heartbeat packets are notreceived from the peer node for a predefined amount of time, the cluster softwareinterprets it as a node failure.When using only TCP/IP networks to send heartbeats, it is difficult to differentiatenode failures from network failures. Because of this, most cluster softwarerecommends (or require) a dedicated point-to-point network for sendingheartbeat packets. Used together with TCP/IP networks, the point-to-pointnetwork prevents cluster software from misinterpreting network componentfailure as node failure. The network type for this point-to-point network may varydepending on the type of network the cluster software supports. RS-232C,Target Mode SCSI, Target Mode SSA is supported for point-to-point networks insome cluster software.Managing system componentsCluster software is responsible for managing system components in a cluster. Itis typically installed on the local disk of each cluster node. There is usually a setof processes or services that is running constantly on the cluster nodes. Itmonitors system components and takes control of those resources whenrequired. These processes or services are often referred to as the clustermanager.On a node, applications and other system components that are required by thoseapplications are bundled into a group. Here, we refer to each application andsystem component as resource, and refer to a group of these resources asresource group.A resource group is generally comprised of one or more applications, one ormore logical storages residing on an external disk, and an IP address that is notbound to a node. There may be more or fewer resources in the group, dependingon application requirements and how much the cluster software is able tosupport. Chapter 2. High level design and architecture 35
    • A resource group is associated with two or more nodes in the cluster, and in normal production. A resource group is the unit that a cluster manager uses to move resources to one node from another. It will reside on the primary node in normal production; in the event of a node or component failure on the primary node, the cluster manager will move the group to another node. Figure 2-2 shows an example of resources and resource groups in a cluster. Cluster_A 192.168.1.101 192.168.1.102 APP1 APP2 DISK1 DISK3 DISK2 DISK4 Node_A Node_B Resource Group: GRP_1 Resource Group: GRP_2 Application: APP1 Application: APP2 Disk: DISK1, DISK2 Disk: DISK3, DISK4 IP Address: IP Address: 192.168.1.101 192.168.1.102Figure 2-2 Resource groups in a cluster In Figure 2-2, a resource group called GRP_1 is comprised of an application called APP1, and external disks DISK1 and DISK2. IP address 192.168.1.101 is associated to GRP_1. The primary node for GRP1 is Node_A, and the secondary node is Node_B. GRP_2 is comprised of application APP2, and disks DISK3 and DISK4, and IP address 192.168.1.102. For GRP_2, Node_B is the primary node and Node_A is the secondary node.36 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Fallover and fallback of a resource groupIn normal production, cluster software constantly monitors the cluster resourcesfor any signs of failure. As soon as a cluster manager running on a node detectsa node or a component failure, it will quickly acquire the ownership of theresource group and restart the application.In our example, assume a case where Node_A crashed. Through heartbeats,Node_B detects Node_A’s failure. Because Node_B is configured as asecondary node for resource GRP_1, Node_B’s cluster manager acquiresownership of resource group GRP_1. As a result, DISK1 and DISK2 aremounted on Node_B, and the IP address associated to GRP_1 has moved toNode_B.Using these resources, Node_B will restart APP1, and resume applicationprocessing. Because these operations are initiated automatically based onpre-defined actions, it is a matter of minutes before processing of APP1 isrestored. This is called a fallover. Figure 2-3 on page 38 shows an image of thecluster after fallover. Chapter 2. High level design and architecture 37
    • Cluster_A 192.168.1.102 192.168.1.101 APP1 DISK1 DISK3 APP2 DISK2 DISK4 Node_A Node_B Resource Group: GRP_2 Application: APP2 Disk: DISK3, DISK4 IP Address: 192.168.1.102 Resource Group: GRP_1 Application: APP1 Disk: DISK1, DISK2 IP Address: 192.168.1.101Figure 2-3 Fallover of a resource group Note that this is only a typical scenario of a fallover. Most cluster software is capable of detecting both hardware and software component failures, if configured to do so. In addition to basic resources such as nodes, network, disks, what other resources could be monitored differs by product. Some cluster software may require more or less configuration to monitor the same set of resources. For details on what your choice of cluster software can monitor, consult your service provider. After a node recovers from a failure, it rejoins the cluster. Depending on the cluster configuration, the resource group that failed over to a standby node is returned to the primary node at the time of rejoining. In this Redbook, we refer to this cluster behavior as fallback.38 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • To describe this behavior using our example, when fallback is initiated, resource group GRP_1 moves back to Node_A and returns to its normal production state as shown in Figure 2-2 on page 36. There are some considerations about fallback. These are summarized in 2.1.2, “Software considerations” on page 39 under Fallback policy. As described, cluster software addresses node failure by initiating a fallover of a resource group from the failed node to the standby node. A failed node would eventually recover from a failure and rejoin the cluster. After the rejoining of the failed node, you would have the choice of either keeping the resource group on the secondary node, or relocating the resource group to the original node. If you choose the latter option, then you should consider the timing of when to initiate the fallback. Most cluster software provides options on how a resource group should be managed in the event of a node rejoining the cluster. Typically you would have the option of either initiating a fallback automatically when the node rejoins the cluster, or have the node just rejoin the cluster and manually initiate a fallback whenever appropriate. When choosing to initiate an automatic fallback, be aware that this initiates a fallback regardless of the application status. A fallback usually requires stopping the application on the secondary node and restarting the application on the primary node. Though a fallback generally takes place in a short period of time, this may disrupt your application processing. To implement a successful HA cluster, certain software considerations and hardware considerations should be met. In the following section, we describe what you need to consider prior to implementing HA clusters.2.1.2 Software considerations In order to make your application highly available, you must either use the high availability functions that your application provides, or put them under the control of cluster software. Many sites look to cluster software as a solution to ensure application high availability, as it is usually the case that high availability functions within an application do not withstand hardware failure. Though most software programs are able to run in a multi-node HA cluster environment and are controllable by cluster software, there are certain considerations to take into account. If you plan to put your application under control of any cluster software, check the following criteria to make sure your application is serviced correctly by cluster software. Application behavior First think about how your application behaves in a single-node environment. Then consider how your application may behave in a multi-node HA cluster. This Chapter 2. High level design and architecture 39
    • determines how you should set up your application. Consider where you should place your application executables, and how you should configure your application to achieve maximum availability. Depending on how your application works, you may have to install them on a shared disk, or just have a copy of the software on the local disk of the other node. If several instances of the same application may run on one node in the event of a fallover, make sure that your application supports such a configuration. Licensing Understand your application licensing requirements and make sure the configuration you plan is not breaching the application license agreements. Some applications are license-protected by incorporating processor-specific information into each instance of application installed. This means that even though you implement your application appropriately and the cluster hardware handles the application correctly in case of a fallover, the application may not be able to start because of your license restrictions. Make sure you have licenses for each node in the cluster that may run your applications. If you plan to have several instances of the same application running on one node, ensure you have the license for each instance. Dependencies Check your application dependencies. When configuring your software for an HA cluster, it is important that you know what your applications are dependent upon, but it is even more important to know what your application should not be dependent upon. Make sure your application is independent of any node-bound resources. Any applications dependent on a resource that is bound to a particular node may have dependency problems, as those resources are usually not attached or accessible to the standby node. Things like binaries or configuration files installed on locally attached drives, hard coding to a particular device in a particular location, and hostname dependencies could become a potential dependency issue. Once you have confirmed that your application does not depend on any local resource, define which resource needs to be in place to run your application. Common dependencies are data on external disks and an IP address for client access. Check to see if your application needs other dependencies. Automation Most cluster software uses scripts or agents to control software and hardware components in a cluster. For this reason, most cluster software requires that any application handled by it must be able to start and stop by command without manual intervention. Scripts to start and stop your applications are generally required. Make sure your application provides startup and shutdown commands.40 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Also, make sure that those commands do not prompt you for operator replies. If you plan to have your application monitored by the cluster software, you may have to develop a script to check the health of your application. Robustness Applications should be stable enough to withstand sudden hardware failure. This means that your application should be able to restart successfully on the other node after a node failure. Tests should be executed to determine if a simple restart of the application is sufficient to recover your application after a hardware failure. If further steps are needed, verify that your recovery procedure could be automated. Fallback policy As described in “Fallover and fallback of a resource group” on page 37, cluster software addresses node failure by initiating a fallover of the resource group from the failed node to the standby node. A failed node would eventually recover from a failure and rejoin the cluster. After the rejoining of the failed node, you would have the choice of either keeping the resource group on the secondary node or relocating the resource group to the original node. If you choose to relocate the resource group to the original node, then you should consider the timing of when to initiate the fallback. Most cluster software gives you options on how a resource group should be managed in the event of a node rejoining the cluster. Typically you would have the option of either initiating a fallback automatically when the node rejoins the cluster, or having the node just rejoin the cluster and manually initiate a fallback whenever appropriate. When choosing to initiate an automatic fallback, be aware that this initiates a fallback regardless of the application status. A fallback usually requires stopping the application on the secondary node and restarting the application on the primary node. Though a fallback generally takes place in a short period of time, this may disrupt your application processing.2.1.3 Hardware considerations In this case, hardware considerations involve how to provide redundancy. A cluster that provides maximum high availability is a cluster with no single points of failure. A single point of failure exists when a critical cluster function is provided by a single component. If that component fails, the cluster has no way of providing that function, and the application or service dependent on that component becomes unavailable. An HA cluster is able to provide high availability for most hardware components when redundant hardware is supplied and the cluster software is configured to Chapter 2. High level design and architecture 41
    • take control of them. Preventing hardware components from becoming single points of failure is not a difficult task; simply duplicating them and configuring the cluster software to handle them in the event of a failure should solve the problem for most components. However, we remind you again that adding redundant hardware components is usually associated with a cost. You may have to make compromises at some point. Consider the priority of your application. Balance® the cost of the failure against the cost of additional hardware and the workload it takes to configure high availability. Depending on the priority and the required level of availability for your application, manual recovery procedures after notifying the system administrator may be enough. In Table 2-2 we point out basic hardware components which could become a single point of failure, and describe how to address them. Some components simply need to be duplicated, with no additional configuration, because the hardware in which they reside automatically switches over to the redundant component in the event of a failure. For other components you may have to perform further configuration to handle them, or write custom code to detect their failure and trigger recovery actions. This may vary depending on the cluster software you use, so consult your service provider for detailed information. Table 2-2 Eliminating single points of failure Hardware component Measures to eliminate single points of failure Node Set up a standby node. An additional node could be a standby for one or more nodes. If an additional node will just be a “hot standby” for one node during production, a node with the same machine power as the active node is sufficient. If you are planning a mutual takeover, make sure the node has enough power to execute all the applications that will run on that server in the event of a fallover. Power source Use multiple circuits or uninterruptable power supplies (UPS.) Network adapter To recover from a network adapter failure, you will need at least two network adapters per node. If your cluster software requires a dedicated TCP/IP network for heartbeats, additional network adapters may be added. Network Have multiple networks to connect nodes.42 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Hardware component Measures to eliminate single points of failure TCP/IP subsystem Use a point-to-point network to connect nodes in the cluster. Most cluster software requires, or recommends, at least one active network (TCP/IP or non-TCP/IP) to send “heartbeats” to the peer nodes. By providing a point-to-point network, cluster software will be able to distinguish a network failure from a node failure. For cluster software that does not support non-TCP/IP network for heartbeats, consult your service provider for ways to eliminate TCP/IP subsystem as a single point of failure. Disk adapter Add an additional disk adapter to each node. When cabling your disks, make sure that each disk adapter has access to each external disk. This enables an alternate access path to external disks in case of a disk adapter failure. Disk controller Use redundant disk controllers. Disk Provide redundant disks and enable RAID to protect your data from disk failures.2.2 Hardware configurations In this section, we discuss the different types of hardware cluster, concentrating on disk clustering rather than network or IP load balancing scenarios. We also examine the differences between a hardware cluster and a hot standby system.2.2.1 Types of hardware cluster There are many types of hardware clustering configurations, but here we concentrate on four different configurations: two-node cluster, multi-node cluster, grid computing, and disk mirroring (these terms may vary, depending on the hardware manufacturer). Two-node cluster A two-node cluster is probably the most common form of hardware cluster configuration; it consists of two nodes which are able to access a disk system that is externally attached to the two nodes, as shown in Figure 2-4 on page 44. The external drive system can be attached over the LAN or SAN network (SSA Disk system), or even by local SCSI cables. This type of cluster is used when configuring only a couple of applications in a high availability cluster. This type of configuration can accommodate either Chapter 2. High level design and architecture 43
    • Active/Passive or Active/Active, depending on the operating system and cluster software that is used. Public Network Connection Private Network Connection Shared Disk Node1 Node2 Figure 2-4 Two-node cluster Multi-node cluster In a multi-node cluster, we have between two and a number of nodes that can access the same disk system, which is externally attached to this group of nodes, as shown in Figure 2-5 on page 45. The external disk system can be over the LAN or SAN. This type of configuration can be used for extra fault tolerance where, if Node1 were to fail, then all work would move onto Node2—but if Node2 were to fail as well, then all work would then move on to the next node, and so on. It also can support many applications running simultaneously across all nodes configured in this cluster. The number of nodes that this configuration can support depends on the hardware and software manufacturers.44 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Public Network Connection Private Network Private Network Private Network Connection Connection Connection Node1 Node2 Node3 Node4 Shared DiskFigure 2-5 Multi-node clusterGrid computingEven though grid computing is not necessarily considered a cluster, it acts likeone, so we will explain the concepts involved. Grid computing is based on theconcept that the IT infrastructure can be managed as a collection of distributedcomputing resources available over a network that appear to an end user orapplication as one large virtual computing system.A grid can span locations, organizations, machine architectures, and softwareboundaries to provide unlimited power, collaboration, and information access toeveryone connected to the grid. Grid computing enables you to delivercomputing power to applications and users that need it on demand, which is onlywhen they need it for meeting business objectives.Disk mirroringDisk mirroring is more commonly used in a hot standby mode, but it is also usedin some clustering scenarios, especially when mirroring two systems acrosslarge distances; this will depend on the software and or hardware capabilities.Disk mirroring functionality can be performed by software in some applicationsand in some clustering software packages, but it can also be performed at thehardware level where you have a local disk on each side of a cluster and any Chapter 2. High level design and architecture 45
    • changes made to one side is automatically sent across to the other side, thus keeping the two sides in synchronization.2.2.2 Hot standby system This terminology is used for a system that is connected to the network and fully configured, with all the applications loaded but not enabled. It is normally an identical system for which it is on standby for, and this is both hardware and software. One hot standby system can be on standby for several live systems which can include application servers which have a Fault Tolerant Agent, IBM Tivoli Workload Scheduler Master Domain Manager or a Domain Manager. The advantage over a hardware cluster is that one server can be configured for several systems, which cut the cost dramatically. The disadvantages over a hardware cluster are as follows: It is not an automatic switchover and can take several minuets or even hours to bring up the standby server. The work that was running on the live server has no visibility on the standby server, so an operator would have to know where to restart the standby server. The standby server has a different name, so the IBM Tivoli Workload Scheduler jobs would not run on this system as defined in the database. Therefore, the IBM Tivoli Workload Scheduler administrator would have to submit the rest of the jobs by hand or create a script to do this work.2.3 Software configurations In this section we cover the different ways to implement IBM Tivoli Workload Scheduler in a cluster and also look at some of the currently available software configurations built into IBM Tivoli Workload Scheduler.2.3.1 Configurations for implementing IBM Tivoli Workload Scheduler in a cluster Here we describe the different configurations of IBM Tivoli Workload Scheduler workstations, how they are affected in a clustered environment, and why each configuration would be put into a cluster. We will also cover the different types of Extended Agents and how they work in a cluster.46 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Master Domain ManagerThe Master Domain Manager is the most critical of all the IBM Tivoli WorkloadScheduler workstation configurations. It is strongly recommended to configurethis into a cluster, as it manages and controls the scheduling database. From thisdatabase, it generates and distributes the 24-hour daily scheduling plan called asymphony file. It also controls, coordinates and keeps track of all the schedulingdependences throughout the entire IBM Tivoli Workload Scheduler network.Keep the following considerations in mind when setting up a Master DomainManager in a cluster: Connectivity to the IBM Tivoli Workload Scheduler database Ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2) Ability of the user interface (IBM Tivoli Workload Scheduler Console) to connect to the new location where IBM Tivoli Workload Scheduler is now running Starting all the IBM Tivoli Workload Scheduler processes and services Coordinating all messages from and to the IBM Tivoli Workload Scheduler network Linking all workstations in its domainLet’s examine these considerations in more detail.IBM Tivoli Workload Scheduler databaseThe IBM Tivoli Workload Scheduler database is held in the same file system asthe installed directory of IBM Tivoli Workload Scheduler. Therefore, providing thisis not being mounted or links to a separate file system, then the database willfollow the IBM Tivoli Workload Scheduler installation.If the version of IBM Tivoli Workload Scheduler used is prior to Version 8.2, thenyou will have to consider the TWShome/../unison/ directory, as this is where partof the database is held (workstation, NT user information); the working securityfile is also held here.The directory TWShome/../unison/ may not be part of the same file system as theTWShome directory, so this will have to be added as part of the cluster package.Because the database is a sequential index link database, there is norequirement to start the database before IBM Tivoli Workload Scheduler canread it. Chapter 2. High level design and architecture 47
    • IBM Tivoli Workload Scheduler components file All versions prior to IBM Tivoli Workload Scheduler Version 8.2 require a components file. The contents of this file must contain the location of both maestro and Netman installations, and it is installed in the directory c:win32appTWSUnisonnetman. Under the UNIX operating system /usr/unson/ this needs to be accessed on both sides of the cluster. IBM Tivoli Workload Scheduler console The IBM Tivoli Workload Scheduler console (called the Job Scheduling Console) connects to the IBM Tivoli Workload Scheduler engine through the IBM Tivoli Management Framework or the Framework. The Framework authenticates the logon user, and communicates to the IBM Tivoli Workload Scheduler engine through two Framework modules (Job Scheduling Services and Job Scheduling Connector). Therefore, you need to consider both the IP address of the Framework and the location of the IBM Tivoli Workload Scheduler engine code. When a user executes the Job Scheduling Console, it prompts for a User name, Password for that user and an address of where the Framework is located. This address can be a fully-qualified domain name or an IP address, but it must be able to connect to where the Framework is running (after the cluster take over). The Job Scheduling Console displays a symbol of an engine. If the IBM Tivoli Workload Schedule engine is active, the engine symbol displays without a red cross through it. If the IBM Tivoli Workload Schedule engine is not active, then the engine symbol has a red crossmark through it, as shown in Figure 2-6. Figure 2-6 Symbol of IBM Tivoli Workload Scheduler engine availability Domain Manager The Domain Manager is the second critical workstation that needs to be protected in a HA cluster, because it controls, coordinates and keeps track of all scheduling dependences between workstations that are defined in the domain that this Domain Manager is managing (which may be hundreds or even a thousand workstations). The considerations that should be kept in mind when setting up a Domain Manager in a cluster are:48 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • The ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2). The ability of the user interface (Job Scheduling Console) to connect to the new location of where IBM Tivoli Workload Scheduler is now running (this is optional, as it is not essential to run the console on this workstation).In addition, the starting of all IBM Tivoli Workload Scheduler processes andservices, the coordination of all messages from and to the IBM Tivoli WorkloadScheduler network, and the linking of all workstations in its domain should betaken into account.Fault Tolerant AgentThe Fault Tolerant Agent may be put in a cluster because a critical applicationneeds to be in a HA environment, so the Fault Tolerant Agent that schedules andcontrols all the batch work needs to be in this same cluster.Keep the following considerations in mind when setting up a Fault Tolerant Agentin a cluster: The ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2) The ability of the user interface (Job Scheduling Console) to connect to the new location of where IBM Tivoli Workload Scheduler is now running (this is optional, as it is not essential to run the console on this workstation).In addition, the starting of all IBM Tivoli Workload Scheduler processes andservices should be taken into account.Extended AgentsAn Extended Agent (xa or x-agent) serves as an interface to an external,non-IBM Tivoli Workload Scheduler system or application. It is defined as an IBMTivoli Workload Scheduler workstation with an access method and a host. Theaccess method communicates with the external system or application to launchand monitor jobs and test Open file dependencies. The host is another IBM TivoliWorkload Scheduler workstation (except another xa) that resolves dependenciesand issues job launch requests via the method.In this section, we consider the implications of implementing these ExtendedAgents in a HA cluster with the different Extended Agents currently available. Allthe Extended Agents are currently installed partly in the application itself andalso on a IBM Tivoli Workload Scheduler workstation (which can be a Master Chapter 2. High level design and architecture 49
    • Domain Manager, a Domain Manager or an Fault Tolerant Agent), so we need to consider the needs of the type of workstation the Extended Agent is installed on. We will cover each type of Extended Agent in turn. The types of agents that are currently supported are: SAP R/3; Oracle e-Business Suite; PeopleSoft; z/OS access method; and Local and Remote UNIX access. For each Extended Agent, we describe how the access method will work in a cluster. SAP R/3 access method When you install and configure the SAP Extended Agent and then create a workstation definition for the SAP instance you wish to communicate with, there will be an R3batch method in the methods directory. This is a C program that communicates with the remote R3 system. It finds where to run the job by reading the r3batch.opts file, and then matching the workstation name with the first field in the r3batch.opts file. R3batch then reads all the parameters in the matched workstation line, and uses these to communicate with the R/3 system. The parameter that we are interested in is the second field of the r3batch.opts file: R/3 Application Server. This will be a IP address or domain name. In order for the Extended Agent to operate correctly, this system should be accessed from wherever IBM Tivoli Workload Scheduler is running. (This operates in the same way for the Microsoft or the UNIX cluster.) Oracle e-Business Suite access method The Oracle e-Business Suite Extended Agent is installed, configured on the same system as the Oracle Application server. When setting this up in a cluster, you must first configure the Fault Tolerant Agent and Extended Agent to be in the same part of the cluster. When the Oracle Applications x-agent is started, the IBM Tivoli Workload Scheduler host executes the access method mcmagent. Using the x-agent’s workstation name as a key, mcmagent looks up the corresponding entry in the mcmoptions file to determine which instance of Oracle Applications it will connect to. The Oracle Applications x-agent can then launch jobs on that instance of Oracle Applications and monitor the jobs through completion, writing job progress and status information to the job’s standard list file. PeopleSoft access method The PeopleSoft Extended Agent is installed and configured on the same system as the PeopleSoft client. It also requires an IBM Tivoli Workload Scheduler Fault Tolerant Agent to host the PeopleSoft Extended Agent, which is also installed and configured on the same system as the PeopleSoft client.50 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • When setting this configuration up in a cluster, you must first configure the FaultTolerant Agent and Extended Agent to be in the same part of the cluster as thePeopleSoft Client.To launch a PeopleSoft job, IBM Tivoli Workload Scheduler executes the psagentmethod, passing it information about the job. An options file provides the methodwith path, executable and other information about the PeopleSoft processscheduler and application server used to launch the job. The Extended Agentcan then access the PeopleSoft process request table and make an entry in thetable to launch the job. Job progress and status information are written to thejob’s standard list file.z/OS access methodIBM Tivoli Workload Scheduler z/OS access method has three separatemethods, depending on what you would like to communicate to on the z/OSsystem. All of these methods work in the same way, and they are: JES, OPC andCA7. The Extended Agent will communicate to the z/OS gateway over TCP/IP,and will use the parameter HOST in the workstation definition to communicate tothe gateway.When configuring a z/OS Extended Agent in a cluster, be aware that thisExtended Agent is hosted by a Fault Tolerant Agent; the considerations for aFault Tolerant Agent are described in 2.3.1, “Configurations for implementing IBMTivoli Workload Scheduler in a cluster” on page 46.The parameter that we are interested in is in the workstation definition HOST.This will be a IP address or domain name. In order for the Extended Agent tooperate correctly, this system should be accessed from wherever the IBM TivoliWorkload Scheduler is running. (This operates in the same way for the Microsoftor the UNIX cluster.)Figure 2-7 on page 52 shows the architecture of the z/OS access method. Chapter 2. High level design and architecture 51
    • TWS Host mvs access method method.opts Unix or NT Host z/OS System mvs gateway JES2/JES3 OPC CA7 Job Figure 2-7 z/OS access method Local UNIX access method When the IBM Tivoli Workload Scheduler sends a job to a local UNIX Extended Agent, the access method, unixlocl, is invoked by the host to execute the job. The method starts by executing the standard configuration script on the host workstation (jobmanrc). If the job’s logon user is permitted to use a local configuration script and the script exists as $HOME/.jobmanrc, the local configuration script is also executed. The job itself is then executed either by the standard or the local configuration script. If neither configuration script exists, the method starts the job. For the local UNIX Extended Agent to function properly in a cluster, the parameter that we are interested in is host, which is in the workstation definition. This will be an IP address or domain name, and providing that wherever the IBM Tivoli Workload Scheduler is running this system can be accessed, then the Extended Agent will still operate correctly. Remote UNIX access method Note: In this section we explain how this access method works in a cluster; this explanation is not meant to be used as a way to set up and configure this Extended Agent. When the IBM Tivoli Workload Scheduler sends a job to a remote UNIX Extended Agent, the access method, unixrsh, creates a /tmp/maestro directory on the non-IBM Tivoli Workload Scheduler computer. It then transfers a wrapper script to the directory and executes it. The wrapper then executes the scheduled job. The wrapper is created only once, unless it is deleted, moved, or outdated.52 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • For the remote UNIX Extended Agent to function properly in a cluster, theparameter that we are interested in is host, which is in the workstation definition.This will be an IP address or domain name, and providing that wherever the IBMTivoli Workload Scheduler is running this system can be accessed, then theExtended Agent will still operate correctly.One instance of IBM Tivoli Workload SchedulerIn this section, we discuss the circumstances under which you might install oneinstance of the IBM Tivoli Workload Scheduler in a high availability cluster.The first consideration is where the product is to be installed: it must be in theshared file system that moves between the two servers in the cluster.The second consideration is how the IBM Tivoli Workload Scheduler instance isaddressed: that must be the IP address that is associated to the cluster.Why to install only one copy of IBM Tivoli Workload SchedulerIn this configuration there may be three reasons for installing only one copy ofIBM Tivoli Workload Scheduler in this cluster: Installing a Master Domain Manager (MDM) in a cluster removes the single point of failure of the IBM Tivoli Workload Scheduler database and makes the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures. Installing a Domain Manager (DM) in a cluster makes the segment of the IBM Tivoli Workload Scheduler network that the Domain Manager manages more fault tolerant against failures. If an application is running in a clustered environment and is very critical to the business, it may have some critical batch scheduling; you could install a Fault Tolerant Agent in the same cluster to handle the batch work.When to install only one copy of IBM Tivoli Workload SchedulerYou would install the workstation in this cluster in order to provide highavailability to an application or to the IBM Tivoli Workload Scheduler network byinstalling the Master Domain Manager in the cluster.Where to install only one copy of IBM Tivoli Workload SchedulerTo take advantage of the cluster, install this instance of IBM Tivoli WorkloadScheduler on the shared disk system that moves between the two sides of thecluster. Chapter 2. High level design and architecture 53
    • What to install Depending on why you are installing one instance of IBM Tivoli Workload Scheduler, you may install a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster. Two instances of IBM Tivoli Workload Scheduler In this section, we discuss the circumstances under which you might install two instances of the IBM Tivoli Workload Scheduler. The first consideration is where the product is to be installed: each IBM Tivoli Workload Scheduler instance must have a different installation directory, and that must be in the shared file system that moves between the two servers in the cluster. Each instance will also have its own installation user. The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster. Each IBM Tivoli Workload Scheduler instance must also have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler. Why to install two instances of IBM Tivoli Workload Scheduler In this configuration there may be two reasons for installing two copies of IBM Tivoli Workload Scheduler in this cluster: Installing a Master Domain Manager and a Domain Manager in the cluster not only removes the single point of failure of the IBM Tivoli Workload Scheduler database, but also makes the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures. If two applications are running in a clustered environment and they are very critical to the business, they may have some critical batch scheduling; you could install a Fault Tolerant Agent for each application running in the cluster to handle the batch work. When to install two instances of IBM Tivoli Workload Scheduler You would install both instances of IBM Tivoli Workload Scheduler in this cluster in order to give a high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or Domain Manager in this cluster. Where to install two instances of IBM Tivoli Workload Scheduler To take advantage of the cluster, you would install the two instances of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two54 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • sides of the cluster. You would set up the cluster software in such a way that thefirst instance of IBM Tivoli Workload Scheduler would have a preference ofrunning on server A and the second instance would have a preference of runningon server B.What to installDepending on why you are installing two instances of IBM Tivoli WorkloadScheduler, you may install a combination of a Master Domain Manager, DomainManager or Fault Tolerant Agent in the cluster.Three instances of IBM Tivoli Workload SchedulerIn this section, we discuss the circumstances under which you might install threeinstances of the IBM Tivoli Workload Scheduler.The first consideration is where the product is to be installed. When twoinstances of IBM Tivoli Workload Scheduler are running on the same system,you must have each IBM Tivoli Workload Scheduler instance installed in adifferent directory—and one of the instances must be installed in the shared filesystem that moves between the two servers in the cluster. Each instance willhave it own installation user.The second consideration is how the IBM Tivoli Workload Scheduler instance isaddressed. In this case, one will have the IP address that is associated to thecluster, and the other two will have the IP address of each system that is in thiscluster. Each IBM Tivoli Workload Scheduler instance must have its own portnumber. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then itwill need to access the components file from both sides of the cluster to run. Ifthe version of IBM Tivoli Workload Scheduler is 8.2 or higher, then thecomponents file is only needed to be sourced when upgrading IBM TivoliWorkload Scheduler.Why to install three instances of IBM Tivoli Workload SchedulerIn this configuration, only one instance is installed in a high availability mode; theother two are installed on the local disks shown in Figure 2-8 on page 56. Whywould you install IBM Tivoli Workload Scheduler in this configuration? Becausean application is running on both sides of the cluster that cannot be configured ina cluster; therefore, you need to install the IBM Tivoli Workload Schedulerworkstation with the application. Also, you may wish to install the Master DomainManager in the cluster, or an third application is cluster-aware and can move.When to install three instances of IBM Tivoli Workload SchedulerYou would install one instance of the IBM Tivoli Workload Scheduler in thiscluster in order to give high availability to an application or to the IBM TivoliWorkload Scheduler network by installing the Master Domain Manager or Chapter 2. High level design and architecture 55
    • Domain Manager in this cluster, and one instance of IBM Tivoli Workload Scheduler on each local disk. This second instance may be scheduling batch work for the systems in the cluster, or an application that only runs on the local disk subsystem. Where to install three instances of IBM Tivoli Workload Scheduler Install one instance of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two sides of the cluster, and one instance of IBM Tivoli Workload Scheduler on the local disk allocated to each side of the cluster, as shown in shown in Figure 2-8. What to install Depending on why you are installing one instance of IBM Tivoli Workload Scheduler as described above, you may a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster. You would install a Fault Tolerant Agent on each side of the cluster. TWS TWS TWS Engine 1 Engine 2 Engine 3 Shared Disk Local Disk Local Disk System 2 Volume System 2 Volume Volume Figure 2-8 Three-instance configuration Multiple instances of IBM Tivoli Workload Scheduler In this section, we discuss the circumstances under which you might install multiple instances of the IBM Tivoli Workload Scheduler. The first consideration is where the product is to be installed, because each IBM Tivoli Workload Scheduler instance must have a different installation directory. These installation directories must be in the shared file system that moves between the two servers in the cluster. Each instance will also have its own installation user. The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster. Each IBM Tivoli Workload Scheduler instance must also have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to56 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler. Why to install multiple instances of IBM Tivoli Workload Scheduler In this configuration there may be many applications running in this cluster, and each application would need to have its own workstation associated with this application. You might also want to install Master Domain Manager and even the Domain Manager in the cluster to make the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures. When to install multiple instances of IBM Tivoli Workload Scheduler You would install multiple instances of IBM Tivoli Workload Scheduler in this cluster to give high availability to an application and to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or Domain Manager in this cluster. Where to install multiple instances of IBM Tivoli Workload Scheduler All instances of IBM Tivoli Workload Scheduler would be installed on the shared disk system that moves between the two sides of the cluster. Each instance would need its own installation directory, its own installation user, and its own port address. What to install Depending on why you are installing multiple instances of IBM Tivoli Workload Scheduler, you may install a combination of a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.2.3.2 Software availability within IBM Tivoli Workload Scheduler In this section we discuss software options currently available with IBM Tivoli Workload Scheduler that will give you a level of high availability if you do not have, or do not want to use, a hardware cluster. Backup Master Domain Manager A Backup Master Domain Manager (BMDM) and the Master Domain Manager (MDM) are critical parts of a highly available IBM Tivoli Workload Scheduler environment. If the production Master Domain Manager fails and cannot be immediately recovered, a backup Master Domain Manager will allow production to continue. The Backup Master Domain Manager must be identified when defining your IBM Tivoli Workload Scheduler network architecture; it must be a member of the Chapter 2. High level design and architecture 57
    • same domain as the Master Domain Manager, and the workstation definition must have the Full Status and Resolve Dependencies modes selected. It may be necessary to transfer files between the Master Domain Manager and its standby. For this reason, the computers must have compatible operating systems. Do not combine UNIX with Windows NT® computers. Also, do not combine little-endian and big-endian computers. When a Backup Master Domain Manager is correctly configured, the Master Domain Manager will send any changes and updates to the production file to the BMDM—but any changes or updates that are made to the database are not automatically sent to the BMDM. In order to keep the BMDM and the MDM databases synchronized, you must manually copy on a daily basis, following start-of-day processing, the TWShomemozart and TWShome..unisonnetwork directories (the unison directory is only for versions older than 8.2). Any changes to the security must be replicated to the BMDM, and configuration files like localopts and globalopts files must also be replicated to the BMDM. The main advantages over a hardware HA solution is that this currently exists in the IBM Tivoli Workload Scheduler product, and the basic configuration where the BMDM takes over the IBM Tivoli Workload Scheduler network for a short-term loss of the MDM is fairly easy to set up. Also, no extra hardware or software is needed to configure this solution. The main disadvantages are that the IBM Tivoli Workload Scheduler database is not automatically synchronized and it is the responsibility of the system administrator to keep both databases in sync. Also, for a long-term loss of the MDM, the BMDM will have to generate a new production day plan and for this an operator will have to submit a Jnextday job on the BMDM. Finally, any jobs or job streams that ran on the MDM will not run on the BMDM, because the workstation names are different. Backup Domain Manager The management of a domain can be assumed by any Fault Tolerant Agent that is a member of the same domain.The workstation definition has to have Full Status and Resolve Dependencies modes selected. When the management of a domain is passed to another workstation, all domain workstations members are informed of the switch, and the old Domain Manager is converted to a Fault Tolerant Agent in the domain. The identification of domain managers is carried forward to each new day’s symphony file, so that switches remain in effect until a subsequent switchmgr command is executed.58 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Once a new workstation has taken over the responsibility of the domain, it has the ability to resolve any dependencies for the domain it is managing, and also the ability to process any messages to or from the network. Switch manager command The switch manager command is used to transfer the management of a IBM Tivoli Workload Scheduler domain to another workstation. This command can be used on the Master Domain Manager or on a Domain Manager. To use the command switchmgr, the workstation that you would like to have take over the management of a domain must be a member of the same domain. It must also have resolve dependences and full status to work correctly. The syntax of the command is switchmgr domain;newmgr. The command stops a specified workstation and restarts it as the Domain Manager. All domain member workstations are informed of the switch, and the old Domain Manager is converted to a Fault Tolerant Agent in the domain. The identification of Domain Managers is carried forward to each new day’s symphony file, so that switches remain in effect until a subsequent switchmgr command is executed. However, if new day processing (the Jnextday job) is performed on the old domain manager, the domain will act as though another switchmgr command had been executed and the old Domain Manager will automatically resume domain management responsibilities.2.3.3 Load balancing software Using load balancing software is another way of bringing a form of high availability to IBM Tivoli Workload Scheduler jobs; the way to do this is by integrating IBM Tivoli Workload Scheduler with IBM LoadLeveler®, because IBM LoadLeveler will detect if a system is unavailable and reschedule it on one that is available. IBM LoadLeveler is a job management system that allows users to optimize job execution and performance by matching job processing needs with available resources. IBM LoadLeveler schedules jobs and provides functions for submitting and processing jobs quickly and efficiently in a dynamic environment. This distributed environment consists of a pool of machines or servers, often referred to as a LoadLeveler cluster. Jobs are allocated to machines in the cluster by the IBM LoadLeveler scheduler. The allocation of the jobs depends on the availability of resources within the cluster and on rules defined by the IBM LoadLeveler administrator. A user submits a job to IBM LoadLeveler and the scheduler attempts to find resources within the cluster to satisfy the requirements of the job. Chapter 2. High level design and architecture 59
    • At the same time, the objective of IBM LoadLeveler is to maximize the efficiency of the cluster. It attempts to do this by maximizing the utilization of resources, while at the same time minimizing the job turnaround time experienced by users.2.3.4 Job recovery In this section we explain how IBM Tivoli Workload Scheduler will treat a job if it has failed; this is covered in three scenarios. A job abends in a normal job run Prior to IBM Tivoli Workload Scheduler Version 8.2, if a job finished with a return code other than 0, the job was treated as ABENDED. If this was the correct return code for this job, the IBM Tivoli Workload Scheduler administrator would run a wrapper script around the job or change the .jobmanrc to change the job status to SUCCES. In IBM Tivoli Workload Scheduler Version 8.2, however, a new field in the job definition allows you to set a boolean expression for the return code of the job. This new field is called rccondsucc. In this field you are allowed to type in a boolean expression which determines the return code (RC) required to consider a job successful. For example, you can define a successful job as a job that terminates with a return code equal to 3 or with a return code greater than or equal to 5, and less than 10, as follows: rccondsucc "RC=3 OR (RC>=5 AND RC<10)" Job process is terminated A job can be terminated in a number of ways, and in this section we look at some of the more common ones. Keep in mind, however, that it is not the responsibility of IBM Tivoli Workload Scheduler to roll back any actions that a job may have done during the time that it was executing. It is the responsibility of the person creating the script or command to allow for a rollback or recovery action. When a job abends, IBM Tivoli Workload Scheduler can rerun the abended job or stop or continue on with the next job. You can also generate a prompt that needs to be replied to, or launch a recovery job. The full combination of the job flow is shown in Figure 2-9 on page 61.60 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • FAILURE JOB 1 JOB 2 JOB 3 JOB 4 JOB 5 Issue a Recovery Prompt Run Recovery Job JOB3A Stop Rerun Continue JOB 3 JOB 4Figure 2-9 IBM Tivoli Workload Scheduler job flowHere are the details of this job flow: When a job is killed through the conman CLI or Job Scheduling Console, the job will be terminated by terminating the parent process. The termination of any child processes that the parent has started will be the responsibility of the operating system and not IBM Tivoli Workload Scheduler. After the job has been terminated, it will be displayed in the current plan in the Abend state. Any jobs or job streams that are dependent on a killed job are not released. Killed jobs can be rerun. When the process ID is “killed”, either in UNIX or Microsoft operating systems, the job will be terminated by terminating the parent process. The termination of any child processes that the parent has started will be the responsibility of the operating system and not IBM Tivoli Workload Scheduler. After the job has been terminated, it will be displayed in the current plan in the Abend state. Any jobs or job streams that are dependent on a killed job are not released. Killed jobs can be rerun. When the system crashes or is powered off, the job is killed by the crash or by the system being powered down. In that case, when the system is re-booted Chapter 2. High level design and architecture 61
    • and IBM Tivoli Workload Scheduler is restarted, IBM Tivoli Workload Scheduler will check to see if there are any jobs left in the jobtable file: – If jobs are left, IBM Tivoli Workload Scheduler will read the process ID and then go out to see if that process ID is still running. – If no jobs are left, it will mark the job as Abend and the normal recovery action will run.62 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 3 Chapter 3. High availability cluster implementation In this chapter, we provide step-by-step installation procedures to help you plan and implement an high availability cluster using High Availability Cluster Multiprocessing for AIX (HACMP) and Microsoft Cluster Service (MSCS), for a mutual takeover scenario of Tivoli Framework and Tivoli Workload Scheduler. We cover the following procedures: “Our high availability cluster scenarios” on page 64 “Implementing an HACMP cluster” on page 67 “Implementing a Microsoft Cluster” on page 138© Copyright IBM Corp. 2004. All rights reserved. 63
    • 3.1 Our high availability cluster scenarios With numerous cluster software packages on the market, each offering a variety of configurations, there are many ways of configuring a high availability (HA) cluster. We cannot cover all possible scenarios, so in this redbook we focus on two scenarios which we believe are applicable to many sites: a mutual takeover scenario for IBM Tivoli Workload Scheduler, and a hot standby scenario for IBM Tivoli Management Framework. We discuss these scenarios in detail in the following sections.3.1.1 Mutual takeover for IBM Tivoli Workload Scheduler In our scenario, we assume a customer case where they plan to manage jobs for two mission-critical business applications. They plan to have the two business applications running on separate nodes, and would like to install separate IBM Tivoli Workload Scheduler Master Domain Managers on each node to control the jobs for each application. They are seeking a cost-effective, high availability solution to minimize the downtime of their business application processing in case of a system component failure. Possible solutions for this customer would be the following: Create separate HA clusters for each node by adding two hot standby nodes and two sets of external disks. Create one HA cluster by adding an additional node and a set of external disks. Designate the additional node as a hot standby node for the two application servers. Create one HA cluster by adding a set of external disks. Each node is designated as a standby for the other node. The first two solutions require additional machines to sit idle until a fallover occurs, while the third solution utilizes all machines in a cluster and no node is left to sit idle. Here we assume that the customer chose the third solution. This type of configuration is called a mutual takeover, as discussed in Chapter 2, “High level design and architecture” on page 31. Note that this type of cluster configuration is allowed under the circumstance that the two business applications in question and IBM Tivoli Workload Scheduler itself have no software or hardware restrictions to run on the same physical machine. Figure 3-1 on page 65 shows a diagram of our cluster.64 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Cluster Tivoli Tivoli Management Management Framework Framework Server 1 Server 1 TWS1 (for APP1) TWS TWS Connector1 Connector2 Instance1 TWS2 Instance1 (for APP2) Instance2 Instance2 Node1 Node2Figure 3-1 Overview of our HA cluster scenario In Figure 3-1, node Node1 controls TWS1 and the application APP1. Node Node2 controls TWS2 and application APP2. TWS1 and TWS2 are installed on the shared external disk so that each instance of IBM Tivoli Workload Scheduler could fall over to another node. We assume that system administrators would like to use the Job Scheduling Console (JSC) to manage the scheduling objects and production plans. To enable the use of JSC, Tivoli Management Framework(TMF) and IBM Tivoli Workload Scheduler Connector must be installed. Because each IBM Tivoli Workload Scheduler instance requires a running Tivoli Management Framework Server or a Managed Node, we need two Tivoli Management Region (TMR) servers. Keep in mind that in our scenario, when a node fails, everything installed on the external disk will fall over to another node. Note that it is not officially supported to run two TMR servers or Managed Nodes in one node. So the possible configuration of TMF in this scenario would be to install TMR servers on the local disks of each node. Chapter 3. High availability cluster implementation 65
    • IBM Tivoli Workload Scheduler connector will also be installed on the local disks. To enable JSC access to both IBM Tivoli Workload Scheduler instances during a fallover, each IBM Tivoli Workload Scheduler Connector needs two connector instances defined: Instance1 to control TWS1, and Instance2 to control TWS2.3.1.2 Hot standby for IBM Tivoli Management Framework In our mutual takeover scenario, we cover the high availability scenario for IBM Tivoli Workload Scheduler.Here, we cover a simple hot standby scenario for IBM Tivoli Management Framework (TMF). Because running multiple instances of Tivoli Management Region server (TMR server) on one node is not supported, a possible configuration to provide high availability would be to configure a cluster with the primary node, hot standby node and a disk subsystem. Figure 3-2 shows a simple hot standby HA cluster with two nodes and a shared external disk. IBM Tivoli Management Framework is installed on the shared disk, and normally resides on Node1. When Node1 fails, TMF will fall over to Node2. Cluster Tivoli Management Framework Server Node2 Node1Figure 3-2 A hot standby cluster for a TMR server66 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 3.2 Implementing an HACMP cluster HACMP is a clustering software provided by IBM for implementing high availability solutions on AIX platforms. In the following sections we describe the process of planning, designing, and implementing a high availability scenario using HACMP. For each implementation procedure discussed in this section, we provide examples by planning an HACMP cluster for IBM Tivoli Workload Scheduler high availability scenario.3.2.1 HACMP hardware considerations As mentioned in Chapter 2, “High level design and architecture” on page 31, the ultimate goal in implementing an HA cluster is to eliminate all possible single points of failure. Keep in mind that cluster software alone does not provide high availability; appropriate hardware configuration is also required to implement a highly available cluster. This applies to HACMP as well. For general hardware considerations about an HA cluster, refer to 2.2, “Hardware configurations” on page 43.3.2.2 HACMP software considerations HACMP not only provides high availability solutions for hardware, but for mission-critical applications that utilize those hardware resources as well. Consider the following before you plan high availability for your applications in an HACMP cluster: Application behavior Licensing Dependencies Automation Robustness Fallback policy For details on what you should consider for each criteria, refer to 2.1.2, “Software considerations” on page 39.3.2.3 Planning and designing an HACMP cluster As mentioned in Chapter 2, “High level design and architecture” on page 31, the sole purpose of implementing an HACMP cluster is to eliminate possible single points of failure in order to provide high availability for both hardware and software. Thoroughly planning the use of both hardware and software components is required prior to HACMP installation. Chapter 3. High availability cluster implementation 67
    • To plan our HACMP cluster, we followed the steps described in HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. Because we cannot cover all possible high availability scenarios, in this section we discuss only the planning tasks needed to run IBM Tivoli Workload Scheduler in a mutual takeover scenario. Planning tasks for a mutual takeover scenario can be extended for a hot standby scenario. The following planning tasks are described in this section. Planning the cluster nodes Planning applications for high availability Planning the cluster network Planning the shared disk device Planning the shared LVM components Planning the resource groups Planning the cluster event processing Use planning worksheets A set of offline and online planning worksheets is provided for HACMP 5.1. For a complete and detailed description of planning an HACMP cluster using these worksheets, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. By filling out these worksheets, you will be able to plan your HACMP cluster easily. Here we describe some of the offline worksheets. (Note, however, that our description is limited to the worksheets and fields that we used; fields and worksheets that were not essential to our cluster plan are omitted.) Draw a cluster diagram In addition to using these worksheets, it is also advisable to draw a diagram of your cluster as you plan. A cluster diagram should provide an image of where your cluster resources are located. In the following planning tasks, we show diagrams of what we planned in each task. Planning the cluster nodes The initial step in planning an HACMP cluster is to plan the size of your cluster. This is the phase where you define how many nodes and disk subsystems you need in order to provide high availability for your applications. If you plan high availability for one application, a cluster of two nodes and one disk subsystem may be sufficient. If you are planning high availability for two or more applications installed on several servers, you may want to add more than one nodes to provide high availability. You may also need more than one disk subsystem, depending on the amount of data you plan to store on external disks.68 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • For our scenario of a mutual takeover, we plan a cluster with two AIX platforms and an SSA disk subsystem to share. Machine types used in the scenario are a given environment in our lab. When planning for a mutual takeover configuration, make sure that each node has sufficient machine power to perform the job of its own and the job of the other node in the event that a fallover occurs. Otherwise, you may not achieve maximum application performance during a fallover. Figure 3-3 shows a diagram of our cluster node plan. The cluster name is cltivoli. There are two nodes in the cluster, tivaix1 and tivaix2, sharing an external disk subsystem. Each node will run one business application and one instance of IBM Tivoli Workload Scheduler to manage that application. Note that we left some blank space in the diagram for adding cluster resources to this diagram as we plan. In this section and the following sections, we describe the procedures to plan an HACMP cluster using our scenario as an example. Some of the planning tasks may be extended to configure high availability for other applications; however, we are not aware of application-specific considerations and high availability requirements. Cluster: cltivoli Disk subsystem Disk Adapter Disk Adapter tivaix1 tivaix2Figure 3-3 Cluster node plan Chapter 3. High availability cluster implementation 69
    • Planning applications for high availability After you have planned the cluster nodes, the next step is to define where your application executables and data should be located, and how you would like HACMP to control them in the event of a fallover or fallback. For each business application or any other software packages that you plan to make highly available, create an application definition and an application server. Application definition means giving a user-defined name to your application, and then defining the location of your application and how it should be handled in the event of fallover. An application server is a cluster resource that associates the application and the names of specially written scripts to start and stop the application. Defining an application server enables HACMP to resume application processing on the takeover node when a fallover occurs. When planning for applications, the following HACMP worksheets may help to record any required information. Application Worksheet Application Server Worksheet Completing the Application Worksheet The Application Worksheet helps you to define which applications should be controlled by HACMP, and how they should be controlled. After completing this worksheet, you should have at least the following information defined: Application Name Assign a name for each application you plan to put under HACMP control. This is a user-defined name associated with an application. Location of Key Application Files For each application, define the following information for the executables and data. Make sure you enter the full path when specifying the path of the application files. -Directory/path where the files reside -Location (internal disk/external disk) -Sharing (shared/not shared) Cluster Name Name of the cluster where the application resides. Node Relationship Specify the takeover relationship of the nodes in the cluster (choose from cascading, concurrent, or rotating). For a description of each takeover relationship,70 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • refer to HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864.Fallover Strategy Define the fallover strategy for the application. Specify which node would be the primary and which node will be the takeover. -The primary node will control the application in normal production. -The takeover node will control the application in the event of a fallover deriving from a primary node failure or a component failure on the primary node.Start Commands/Procedures Specify the commands or procedures for starting the application. This is the command or procedure you will write in your application start script. HACMP invokes the application start script in the event of a cluster start or a fallover.Verification Commands Specify the commands to verify that your application is up or running.Stop Commands/Procedures Specify the commands or procedures for stopping the application. This is the command or procedure you will write in your application stop script. HACMP will invoke the application stop script in the event of a cluster shutdown or a fallover.Verification Commands Specify the commands to verify that your application has stopped. Note: Start, stop, and verification commands specified in this worksheet should not require operator intervention; otherwise, cluster startup, shutdown, fallover, and fallback may halt.Table 3-1 on page 72 and Table 3-2 on page 73 show examples of how weplanned IBM Tivoli Workload Scheduler for high availability. Because we plan tohave two instances of IBM Tivoli Workload Scheduler running in one cluster, wedefined two applications, TWS1 and TWS2. In normal production, TWS1 resideson node tivaix1, while TWS2 resides on node tivaix2. Chapter 3. High availability cluster implementation 71
    • Note: If you are installing an IBM Tivoli Workstation Scheduler version older than 8.2, you cannot use /usr/maestro and /usr/maestro2 as the installation directories. Why? Because in such a case, both installations would use the same Unison directory—and the Unison directory should be unique for each installation. Therefore, if installing a version older than 8.2, we suggest using /usr/maestro1/TWS and /usr/maestro2/TWS as the installation directories, which will make the Unison directory unique. For Version 8.2, this is not important, since the Unison directory is not used in this version. Notice that we placed the IBM Tivoli Workload Scheduler file systems on the external shared disk, because both nodes must be able to access the two IBM Tivoli Workload Scheduler instances for fallover. The two instances of IBM Tivoli Workload Scheduler should be located in different file systems to allow both instances of IBM Tivoli Workload Scheduler to run on the same node. Node relationship is set to cascading because each IBM Tivoli Workload Scheduler instance should return to its primary node when it rejoins the cluster. Table 3-1 Application definition for IBM Tivoli Workload Scheduler1 (TWS1) Items to define Value Application Name TWS1 Location of Key Application Files 1. Directory/path where the files 1. /usr/maestro reside 2. external disk 2. Location (internal disk/external 3. shared disk) 3. Sharing (shared/not shared) Cluster Name cltivoli Node Relationship cascading Fallover Strategy tivaix1: primary tivaix2: takeover72 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Items to define Value Start Commands/Procedures 1. run conman start to start IBM Tivoli Workload Scheduler process as maestro user 2. run conman link @; noask to link all FTAs Verification Commands 1. run ps -ef | grep -v grep | grep ‘/usr/maestro’ 2. check that netman, mailman, batchman and jobman are running Stop Commands/Procedures 1. run conman unlink @;noask to unlink all FTAs as maestro user 2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user Verification Commands 1. run ps -ef | grep -v grep | grep ‘/usr/maestro’ 2. check that netman, mailman, batchman and jobman are not runningTable 3-2 Application definition for Tivoli Workstation Scheduler2 (TWS2) Items to define Value Application Name TWS2 Location of Key Application Files 1. Directory/path where the files 1. /usr/maestro2 reside 2. external disk 2. Location (internal disk/external 3. shared disk) 3. Sharing (shared/not shared) Cluster Name cltivoli Node Relationship cascading Fallover Strategy tivaix2: primary tivaix1: takeover Chapter 3. High availability cluster implementation 73
    • Items to define Value Start Commands/Procedures 1. run conman start to start IBM Tivoli Workload Scheduler process as maestro user 2. run conman link @; noask to link all FTAs Verification Commands 1. run ps -ef | grep -v grep | grep ‘/usr/maestro2’ 2. check that netman, mailman, batchman and jobman are running Stop Commands/Procedures 1. run conman unlink @;noask to unlink all FTAs as maestro user 2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user Verification Commands 1. run conman unlink @;noask to unlink all FTAs as maestro user 2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user Completing the Application Server Worksheet This worksheet helps you to plan the application server cluster resource. Define an application server resource for each application that you defined in the Application Worksheet. If you plan to have more than one application server in a cluster, then add a server name and define the corresponding start/stop script for each application server. Cluster Name Enter the name of the cluster. This must be the same name you specified for Cluster Name in the Application Worksheet. Server Name For each application in the cluster, specify an application server name. Start Script Specify the name of the application start script for the application server in full path. Stop Script Specify the name of the application stop script for the application server in full path. We defined two application servers, tws_svr1 and tws_svr2 in our cluster; tws_svr1 is for controlling application TWS1, and tws_svr2 is for controlling application TWS2. Table 3-3 shows the values we defined for tws_svr1.74 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Table 3-3 Application server definition for tws_svr1 Items to Value define Cluster Name cltivoli Server Name tws_svr1 Start Script /usr/es/sbin/cluster/scripts/start_tws1 .sh Stop Script /usr/es/sbin/cluster/scripts/stop_tws1. shTable 3-4 shows the values we defined for tws_svr2.Table 3-4 Application server definition for tws_svr2 Items to Value define Cluster Name cltivoli Server Name tws_svr2 Start Script /usr/es/sbin/cluster/scripts/start_tws2 .sh Stop Script /usr/es/sbin/cluster/scripts/stop_tws2. shAfter planning your application, add the information about your applications intoyour diagram. Figure 3-4 shows an example of our cluster diagram populatedwith our application plan. We omitted specifics such as start scripts and stopscripts, because the purpose of the diagram is to show the names and locationsof cluster resources. Chapter 3. High availability cluster implementation 75
    • Cluster: cltivoli SSA Disk subsystem SSA SSA tivaix1 tivaix2Figure 3-4 Cluster diagram with applications added Planning the cluster network The cluster network must be planned so that network components (network, network interface cards, TCP/IP subsystems) are eliminated as a single point of failure. When planning the cluster network, complete the following tasks: Design the cluster network topology. Network topology is the combination of IP and non-IP (point-to-point) networks to connect the cluster nodes and the number of connections each node has to each network. Determine whether service IP labels will be made highly available with IP Address Takeover (IPAT) via IP aliases or IPAT via IP Replacement. Also determine whether IPAT will be done with or without hardware address takeover. Service IP labels are relocatable virtual IP label HACMP uses to ensure client connectivity in the event of a fallover. Service IP labels are not bound to a particular network adapter. They can be moved from one adapter to another, or from one node to another.76 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • We used the TCP/IP Network Worksheet, TCP/IP Network Interface Worksheet,and Point-to-point Networks Worksheet to plan our cluster network.Completing the TCP/IP Network WorksheetEnter information about all elements of your TCP/IP network that you plan tohave in your cluster. The following items should be identified when you completethis worksheet.Cluster Name The name of your cluster.Then, for each network, specify the following.Network Name Assign a name for the network.Network Type Enter the type of the network (Ethernet, Token Ring, and so on.)Netmask Enter the subnet mask for the network.Node Names Enter the names of the nodes you plan to include in the network.IPAT via IP Aliases Choose whether to enable IP Address Takeover (IPAT) over IP Aliases or not. If you do not enable IPAT over IP Aliases, it will be IPAT via IP Replacement. For descriptions of the two types of IPAT, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861.IP Address Offset for Heartbeating over IP Aliases Complete this field if you plan heartbeating over IP Aliases. For a detailed description, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861.Table 3-5 on page 81 lists the values we specified in the worksheet. We definedone TCP/IP network called net_ether_01. Note: A network in HACMP is a group of network adapters that will share one or more service IP labels. Include all physical and logical network that act as a backup for one another in one network. For example, if two nodes are connected to two redundant physical networks, then define one network to include the two physical networks. Chapter 3. High availability cluster implementation 77
    • Table 3-5 TCP/IP Network definition Cluster Name cltivoli Network Network Type Netmask Node Names IPAT via IP IP Address Name Aliases Offset for Heartbeating over IP Aliases net_ether_01 Ethernet 255.255.255.0 tivaix1, tivaix2 enable 172.16.100.1 Completing the TCP/IP Network Interface Worksheet After you have planned your TCP/IP network definition, plan your network Interface. Associate your IP labels and IP address to network interface. When you complete this worksheet, the following items should be defined. Complete this worksheet for each node you plan to have in your cluster. Node Name Enter the node name. IP Label Assign an IP label for each IP Address you plan to have for the node. Network Interface Assign a physical network interface (for example, en0, en1) to the IP label. Network Name Assign an HACMP network name. This network name must be one of the networks you defined in the TCP/IP Network Worksheet. Interface Function Specify the function of the interface and whether the interface is service, boot or persistent. Note: In HACMP, there are several kinds of IP labels you can define. A boot IP label is a label that is bound to one particular network adapter. This label is used when the system starts. A Service IP label is a label that is associated with a resource group and is able to move from one adapter to another on the same node, or from one node to another. It floats among the physical TCP/IP network interfaces to provide IP address consistency to an application serviced by HACMP. This IP label exists only when cluster is active. A Persistent IP label is a label bound to a particular node. This IP label also floats among two or more adapters in one node, to provide constant access to a node, regardless of the cluster state. IP Address Associate an IP address to the IP label.78 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Netmask Enter the netmask. Hardware Address Specify hardware address of the network adapter if you plan IPAT with hardware address takeover. Table 3-6 and Table 3-7 show the values we entered in our worksheet. We omitted hardware address because we do not plan to have hardware address takeover.Table 3-6 TCP/IP network interface plan for tivaix1 Node Name tivaix1 IP Label Network Network Interface IP Address Netmask Interface Name Function tivaix1_svc - net_ether_01 service 9.3.4.3 255.255.254.0 tivaix1_bt1 en0 net_ether_01 boot 192.168.100.101 255.255.254.0 tivaix1_bt2 en1 net_ether_01 boot 10.1.1.101 255.255.254.0 tivaix1 - net_ether_01 persistent 9.3.4.194 255.255.254.0Table 3-7 TCP/IP network interface plan for tivaix2 Node Name tivaix2 IP Label Network Network Interface IP Address Netmask Interface Name Function tivaix2_svc - net_ether_01 service 9.3.4.4 255.255.254.0 tivaix2_bt1 en0 net_ether_01 boot 192.168.100.102 255.255.254.0 tivaix2_bt1 en1 net_ether_01 boot 10.1.1.102 255.255.254.0 tivaix2 - net_ether_01 persistent 9.3.4.195 255.255.254.0 Completing the Point-to-Point Networks Worksheet You may need a non-TCP/IP point-to-point network in the event of a TCP/IP subsystem failure. The Point-to-Point Networks Worksheet helps you to plan non-TCP/IP point-to-point networks. When you complete this worksheet, you should have the following items defined. Cluster name Enter the name of your cluster. Then, for each of your point-to-point networks, enter the values for the following items: Network Name Enter the name of your point-to-point network. Chapter 3. High availability cluster implementation 79
    • Network Type Enter the type of your network (disk heartbeat, Target Mode SCSI, Target Mode SSA, and so on). Refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861, for more information. Node Names Enter the name of the nodes you plan to connect with the network. Hdisk Enter the name of the physical disk (required only for disk heartbeat networks). Table 3-8 on page 91 lists the definition for point-to-point network we planned in our scenario. We omitted the value for Hdisk because we did not plan disk heartbeats. Table 3-8 Point-to-point network definition Cluster Name cltivoli Network Name Network Type Node Names net_tmssa_01 Target Mode SSA tivaix1, tivaix2 After you have planned your network, add your network plans to the diagram. Figure 3-5 on page 81 shows our cluster diagram with our cluster network plans added. There is a TCP/IP network definition net_ether_01. For a point-to-point network, we added net_tmssa_01. For each node, we have two boot IP labels, a service IP label and a persistent IP label.80 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Cluster: cltivoli service tivaix1_svc, service tivaix2_svc, persistent tivaix1 persistent tivaix2 boot1 tivaix1_bt1 boot2: tivaix1_bt2 net_ether_01 SSA Disk subsystem en0 en0 en1 en1 net_tmssa_01 SSA SSA tivaix1 tivaix2 IP labels and address for IP labels and address for tivaix1: tivaix2: tivaix1_bt1: 192.168.100.101 tivaix2_bt1: 192.168.100.102 tivaix1_bt2: 10.1.1.101 tivaix2_bt2: 10.1.1.102 tivaix1_svc: 9.3.4.3 tivaix2_svc: 9.3.4.4 tivaix1: 9.3.4.194 tivaix2: 9.3.4.195Figure 3-5 Cluster diagram with network topology added Planning the shared disk device Shared disk is an essential part of HACMP cluster. It is usually one or more external disks shared among two or more cluster nodes. In a non-concurrent configuration, only one node at a time has control of the disks. If a node fails in a cluster, the node with the next highest priority in the cluster acquires the ownership of the disks and restarts applications to restore mission-critical services. This ensures constant access to application executables and data stored on those disks. When you complete this task, at a minimum the following information should be defined: Type of shared disk technology The number of disks required The number of disk adapters Chapter 3. High availability cluster implementation 81
    • HACMP supports several disk technologies, such as SCSI and SSA. For a complete list of supported disk device, consult your service provider. We used an SSA disk subsystem for our scenario, because this was the given environment of our lab. Because we planned to have two instances of IBM Tivoli Workload Scheduler installed in separate volume groups, we needed at least two physical disks. Mirroring SSA disks is recommended, as mirroring an SSA disk enables the replacement of a failed disk drive without powering off entire system. Mirroring requires an additional disk for each physical disk, so the minimum number of disks would be four physical disks. To avoid having disk adapters become single points of failure, redundant disk adapters are recommended. In our scenario, we had one disk adapter for each node, due to the limitations of our lab environment. Figure 3-6 on page 83 shows a cluster diagram with at least four available disks in the SSA subsystem. While more than one disk adapter per node is recommended, we only have one disk adapter on each node due to the limitations of our environment.82 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Cluster: cltivoli service tivaix1_svc, service tivaix2_svc, persistent tivaix1 persistent tivaix2 boot1 tivaix1_bt1 boot1 tivaix2_bt1 boot2 tivaix1_bt2 net_ether_01 boot2 tivaix2_bt2 en0 en0 en1 en1 net_tmssa_01 SSA SSA tivaix1 tivaix2 IP labels and address for IP labels and address for tivaix1: tivaix2: tivaix1_bt1: 192.168.100.101 tivaix2_bt1: 192.168.100.102 tivaix1_bt2: 10.1.1.101 tivaix2_bt2: 10.1.1.102 tivaix1_svc: 9.3.4.3 tivaix2_svc: 9.3.4.4 tivaix1: 9.3.4.194 tivaix2: 9.3.4.195Figure 3-6 Cluster diagram with disks added Planning the shared LVM components AIX uses Logical Volume Manager (LVM) to manage disks. LVM components (physical volumes, volume groups, logical volumes, file systems) maps data between physical and logical storage. For more information on AIX LVM, refer to AIX System Management Guide. To share and control data in an HACMP cluster, you need to define LVM components. When planning for LVM components, we used the Shared Volume Group/Filesystem Worksheet. Chapter 3. High availability cluster implementation 83
    • Completing the Shared Volume Group/Filesystem Worksheet For each field in the worksheet, you should have at least the following information defined. This worksheet should be completed for each shared volume group you plan to have in your cluster. Node Names Record the node name of the each node in the cluster. Shared Volume Group Name Specify a name for the volume group shared by the nodes in the cluster. Major Number Record the planned major number for the volume group. This field could be left blank to use the system default if you do not plan to have NFS exported filesystem. When configuring shared volume group, take note of the major number. You may need this when importing volume groups on peer nodes. Log Logical Volume Name Specify a name for the log logical volume (jfslog). The name of the jfslog must be unique in the cluster. (Do not use the system default name, because a log logical name on another node may be assigned the identical name.) When creating jfslog, make sure you rename it to the name defined in this worksheet. Physical Volumes For each node, record the names of physical volumes you plan to include in the volume group. Physical volume names may differ by node, but PVIDs (16-digit IDs for physical volumes) for the shared physical volume must be the same on all nodes. To check the PVID, use the lspv command. Then, for each logical volume you plan to include in the volume group, fill out the following information: Logical Volume Name Assign a name for the logical volume. Number of Copies of Logical Partition Specify the number of copies of the logical volume. This number is needed for mirroring the logical volume. If you plan mirroring, the number of copies must be 2 or 3.84 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Filesystem Mount Point Assign a mount point for the logical volume name.Size Specify the size of the file system in 512-byte blocks.Table 3-9 and Table 3-10 show the definition of volume groups planned for ourscenario. Because we plan to have shared volume groups for each instance ofIBM Tivoli Workload Scheduler, we defined volume groups tiv_vg1 and tiv_vg2.Then, we defined one logical volume in each of the volume groups to host a filesystem. We assigned major numbers instead of using system default, but this isnot mandatory when you are not using NFS exported file systems.Table 3-9 Definitions for shared volume groups /file system (tiv_vg1) Items to define Value Node Names tivaix1, tivaix2 Shared Volume Group Name tiv_vg1 Major Number 45 Log Logical Volume name lvtws1_log Physical Volume son tivaix1 hdisk6 Physical Volumes on tivaix2 hdisk7 Logical Volume Name lvtws1 Number of Copies 2 Filesystem Mount point /usr/maestro Size 1048576Table 3-10 Definitions for shared volume groups /file system (tiv_vg2) Items to define Value Node Names tivaix1, tivaix2 Shared Volume Group Name tiv_vg2 Major Number 46 Log Logical Volume name lvtws2_log Physical Volumes on tivaix1 hdisk7 Physical Volumes on tivaix2 hdisk20 Logical Volume Name lvtws2 Chapter 3. High availability cluster implementation 85
    • Items to define Value Number of Copies 2 Filesystem Mount point /usr/maestro2 Size 1048576 Figure 3-8 on page 91 shows the cluster diagram with shared LVM components added. Cluster: cltivoli service tivaix1_svc, service tivaix2_svc, persistent tivaix1 persistent tivaix2 boot1 tivaix1_bt1 net_ether_01 boot1 tivaix2_bt1 boot2 tivaix1_bt2 boot2 tivaix2_bt2 en0 en0 en1 en1 net_tmssa_01 SSA tivaix1 tivaix2 IP labels and address for IP labels and address for tivaix1: tivaix2: tivaix1_bt1: 192.168.100.101 tivaix2_bt1: 192.168.100.102 tivaix1_bt2: 10.1.1.101 tivaix2_bt2: 10.1.1.102 tivaix1_svc: 9.3.4.3 tivaix2_svc: 9.3.4.4 tivaix1: 9.3.4.194 tivaix2: 9.3.4.195Figure 3-7 Cluster diagram with shared LVM added86 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Planning the resource groupsA resource group refers to a set of resources that will move from one node toanother in the event of an HACMP fallover or a fallback. A resource groupusually consists of volume groups and service IP address. For this task, we usedthe Resource Group Worksheet. One worksheet must be completed for eachresource group that you plan. The following items should be defined when youcomplete the worksheet.Cluster Name Specify the name of the cluster where the resource group reside. This should be the name that you have defined when planning the cluster nodes.Resource Group Name Assign a name for the resource group you are planning.Management Policy Choose the management policy of the resource group (Cascading, Rotating, Concurrent or Custom). For details on management policy, refer to HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864.Participating Nodes/Default Node PrioritySpecify the name of nodes that may acquire the resource group. When specifying the nodes, make sure the nodes are listed in the order of their priority (nodes with higher priority should be listed first.)Service IP Label Specify the service IP label for IP Address Takeover (IPAT). This IP label is associated to the resource group, and it is transferred to another adapter or a node in the event of a resource group fallover.Volume Groups Specify the name of the volume group(s) to include in the resource group.Filesystems Specify the name of the file systems to include in the resource group. Chapter 3. High availability cluster implementation 87
    • Note: There is no need to specify file system names if you have specified a name of a volume group, because all the file systems in the specified volume group will be mounted by default. In the worksheet, leave the file system field blank unless you need to include individual file systems. Filesystems Consistency Check Specify fsck or logredo. This is the method to check consistency of the file system. Filesystem Recovery Method Specify parallel or sequential. This is the recovery method for the file systems. Automatically Import Volume Groups Set it to true if you wish to have volume group imported automatically to any cluster nodes in the resource chain. Inactive Takeover Set it to true or false. If you want the resource groups acquired only by the primary node, set this attribute to false. Cascading Without Fallback Activated Set it to true or false. If you set this to true, then a resource group that has failed over to another node will not fall back automatically in the event that its primary node rejoins the cluster. This option is useful if you do not want HACMP to move resource groups during application processing. Disk Fencing Activated Set it to true or false. File systems Mounted before IP Configured Set it true or false. Table 3-11 on page 89 and Table 3-12 on page 89 show how we planned our resource groups. We defined one resource group for each of the two instances of IBM Tivoli Workload Scheduler, rg1 and rg2. Notice that we set Inactive Takeover Activated to false, because we want the resource group to always be acquired by the node that has the highest priority in the resource chain.88 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • We set Cascading Without Fallback Activated to true because we do not wantIBM Tivoli Workload Scheduler to fall back to the original node while jobs arerunning.Table 3-11 Definition for resource group tws1_rg Items to define Value Cluster Name cltivoli Resource Group Name rg1 Management policy cascading Participating Nodes/Default Node tivaix1, tivaix2 Priority Service IP Label tivaix1_svc Volume Groups tiv_vg1 Filesystems Consistency Check fsck Filesystem Recovery Method sequential Automatically Import Volume Groups false Inactive Takeover Activated false Cascading Without Fallback Activated true Disk Fencing Activated false File systems Mounted before IP false ConfiguredTable 3-12 Definition for resource group tws1_rg2 Items to define Value Cluster Name cltivoli Resource Group Name rg1 Management policy cascading Participating Nodes/Default Node tivaix2, tivaix1 Priority Service IP Label tivaix2_svc Volume Groups tiv_vg2 Chapter 3. High availability cluster implementation 89
    • Items to define Value Filesystems Consistency Check fsck Filesystem Recovery Method sequential Automatically Import Volume Groups false Inactive Takeover Activated false Cascading Without Fallback Activated true Disk Fencing Activated false File systems Mounted before IP false Configured Figure 3-9 on page 85 shows the cluster diagram with resource groups added.90 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Cluster: cltivoli service tivaix1_svc, service tivaix2_svc, persistent tivaix1 persistent tivaix2 boot1 tivaix1_bt1 boot1 tivaix2_bt1 boot2 tivaix1_bt2 net_ether_01 SSA Disk subsystem en0 en0 en1 en1 SSA SSA tivaix1 tivaix2 Resource Group : rg1 Resource Group: rg2 Service IP Label : Service IP Label: tivaix1_svc tivaix2_svc Volume Group: tiv_vg1 Volume Group : tiv_vg2 IP labels and address for IP labels and address for tivaix1: tivaix2: tivaix1_bt1: 192.168.100.101 tivaix2_bt1: 192.168.100.102 tivaix1_bt2: 10.1.1.101 tivaix2_bt2: 10.1.1.102 tivaix1_svc: 9.3.4.3 tivaix2_svc: 9.3.4.4 tivaix1: 9.3.4.194 tivaix2: 9.3.4.195Figure 3-8 Cluster diagram with resource group added Planning the cluster event processing A cluster event is a change of status in the cluster. For example, if a node leaves the cluster, that is a cluster event. HACMP takes action based on these events by invoking scripts related to each event. A default set of cluster events and related scripts are provided. If you want some specific action to be taken on an occurrence of these events, you can define a command or script to execute before/after each event. You may also define events of your own. For details on cluster events and customizing events to tailor your needs, refer to HACMP documentation. In this section, we give you an example of customized cluster event processing. In our scenario, we planned our resource group with CWOF because we do not Chapter 3. High availability cluster implementation 91
    • want HACMP to fallback IBM Tivoli Workload Scheduler during job execution. However, this leaves two instances of IBM Tivoli Workload Scheduler running on one node, even after the failed node has reintegrated into the cluster. The resource group must be manually transferred to the reintegrated node, or some implementation must be done to automate this procedure. Completing the Cluster Event Worksheet To plan cluster event processing, you will need to define several items. The Cluster Event Worksheet helps you to plan your cluster events. Here we describe the items that we defined for our cluster events. Cluster Name The name of the cluster. Cluster Event Name The name of the event you would like to configure. Post-Event Command The name of the command or script you would like to execute after the cluster event you specified in the Cluster Event Name field. Table 3-13 shows the values we defined for each item. Table 3-13 Definition for cluster event Items to define Value Cluster Name cltivoli Cluster Event Name node_up_complete Post-Event Command /usr/es/sbin/cluster/sh/quiesce_tws.sh3.2.4 Installing HACMP 5.1 on AIX 5.2 This section provides step-by-step instructions for installing HACMP 5.1 on AIX 5.2. First we cover the steps to prepare the system for installing HACMP, then we go through the installation and configuration steps. Preparation Before you install HACMP software, complete the following tasks: Meet all hardware and software requirements Configure the disk subsystems Define the shared LVM components Configure Network Adapters92 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Meet all hardware and software requirements Make sure your system meets the hardware and software requirements for HACMP software. The requirements may vary based on the hardware type and software version that you use. Refer to the release notes for requirements. Configure the disk subsystems Disk subsystems are an essential part of an HACMP cluster. The external disk subsystems enable physically separate nodes to share the same set of disks. Disk subsystems must be cabled and configured properly so that all nodes in a cluster is able to access the same set of disks. Configuration may differ depending on the type of disk subsystems you use. In our scenario, we used IBM 7133 Serial Storage Architecture (SSA) Disk Subsystem Model 010. Figure 3-9 shows how we cabled our 7133 SSA Disk Subsystem. Node1 Node2 SSA SSA Adapter Adapter 7133 Unit Disk Group4 Disk Group3 Disk Group3 back front Disk Group2 Disk Group1Figure 3-9 SSA Cabling for high availability scenario Chapter 3. High availability cluster implementation 93
    • The diagram shows a single 7133 disk subsystem containing eight disk drives connected between two nodes in a cluster. Each node has one SSA Four Port Adapter. The disk drives in the 7133 are cabled to the two machines in two loops. Notice that there is a loop that connects Disk Group1 and the two nodes, and another loop that connects Disk Group2 and the two nodes. Each loop is connected to a different port pair on the SSA Four Port Adapters, which enables the two nodes to share the same set of disks. Once again, keep in mind that this is only an example scenario of a 7133 disk subsystem configuration. Configuration may vary depending on the hardware you use. Consult your system administrator for precise instruction on configuring your external disk device. Important: In our scenario, we used only one SSA adapter per node. In actual production environments, we recommend that an additional SSA adapter be added to each node to eliminate single points of failure. Define the shared LVM components Prior to installing HACMP, shared LVM components such as volume groups and file systems must be defined. In this section, we provide a step-by-step example of the following tasks: Defining volume groups Defining file systems Renaming logical volumes Importing volume groups Testing volume group migrations Defining volume groups 1. Log in as root user on tivaix1. 2. Open smitty. The following command takes you to the Volume Groups menu. # smitty vg a. In the Volume Groups menu, select Add a Volume Group as seen in Figure 3-10 on page 95.94 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Volume Groups Move cursor to desired item and press Enter. [TOP] List All Volume Groups Add a Volume Group Set Characteristics of a Volume Group List Contents of a Volume Group Remove a Volume Group Activate a Volume Group Deactivate a Volume Group Import a Volume Group Export a Volume Group Mirror a Volume Group Unmirror a Volume Group Synchronize LVM Mirrors Back Up a Volume Group Remake a Volume Group Preview Information about a Backup [MORE...4] F1=Help F2=Refresh F3=Cancel Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-10 Volume Group SMIT menu b. In the Add a Volume Group screen (Figure 3-11 on page 96), enter the following value for each field. Note that physical volume names and volume group major number may vary according to your system configuration. VOLUME GROUP Name:tivaix1 Physical Partition SIZE in megabytes:4 PHYSICAL VOLUME names:hdisk6, hdisk7 Activate volume group AUTOMATICALLY at system restart?: no Volume Group MAJOR NUMBER:45 Create VG Concurrent Capable?:no Chapter 3. High availability cluster implementation 95
    • Add a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] VOLUME GROUP name [tiv_vg1] Physical partition SIZE in megabytes 4 + * PHYSICAL VOLUME names [hdisk6 hdisk7] + Force the creation of a volume group? no + Activate volume group AUTOMATICALLY yes + at system restart? Volume Group MAJOR NUMBER [45] +# Create VG Concurrent Capable? no + Create a big VG format Volume Group? no + LTG Size in kbytes 128 + F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-11 Defining a volume group c. Verify that the volume group you specified in the previous step (step d) is successfully added and varied on. # lsvg -o Example 3-1 shows the command output. With the -o option, you will only see the volume groups that are successfully varied on. Notice that volume group tiv_vg1 is added and varied on. Example 3-1 lvsg -o output # lsvg -o tiv_vg1 rootvg Defining file systems 1. To create a file system, enter the following command. This command takes you to the Add a Journaled File System menu. # smitty crjfs96 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 2. Select Add a Standard Journaled File System (Figure 3-12). You are prompted to select a volume group in which the shared filesystem should reside. Select the shared volume group that you defined previously, and proceed to the next step. Add a Journaled File System Move cursor to desired item and press Enter. Add a Standard Journaled File System Add a Compressed Journaled File System Add a Large File Enabled Journaled File System F1=Help F2=Refresh F3=Cancel Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-12 Add a Journaled File System menu 3. Specify the following values for the new journaled file system. Volume Group Name:tiv_vg1 SIZE of file system unit size:Megabytes Number of Units: 512 MOUNT POINT: /usr/maestro Mount AUTOMATICALLY at system restart?:no Start Disk Accounting: no Chapter 3. High availability cluster implementation 97
    • Note: When creating a file system that will be put under control of HACMP, do not set the attribute of Mount AUTOMATICALLY at system restart to YES. HACMP will mount the file system after cluster start. Figure 3-13 shows our selections. Add a Standard Journaled File System Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] Volume group name tiv_vg1 SIZE of file system Unit Size Megabytes + * Number of units [512] # * MOUNT POINT [/usr/maestro] Mount AUTOMATICALLY at system restart? no + PERMISSIONS read/write + Mount OPTIONS [] + Start Disk Accounting? no + Fragment Size (bytes) 4096 + Number of bytes per inode 4096 + Allocation Group Size (MBytes) 8 + F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-13 Defining a journaled file system 4. Mount the file system using the following command: # mount /usr/maestro 5. Using the following command, verify that the filesystem is successfully added and mounted: # lsvg -l tiv_vg198 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Example 3-1 on page 96 shows a sample of the command output.Example 3-2 lsvg -l tiv_vg1 outputtiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTloglv00 jfslog 1 1 1 open/syncd N/Alv06 jfs 1 1 1 open/syncd /usr/maestro6. Unmount the file system using the following command: # umount /usr/maestroRenaming logical volumesBefore we proceed to configuring network adapters, we need to rename thelogical volume name for the file system we created. This is because in anHACMP cluster, all shared logical volumes need to have a unique name.1. Determine the name of the logical volume and the logical log volume by entering the following command. # lsvg -l tiv_vg1 Example 3-3 shows the command output. Note that the logical volume name is loglv00, and the file system is lv06.Example 3-3 lsvg -l tiv_vg1 output# lsvg -l tiv_vg1tiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTloglv00 jfslog 1 1 1 closed/syncd N/Alv06 jfs 1 1 1 closed/syncd /usr/maestro2. Enter the following command. This will take you to the Change a Logical Volume menu. # smitty chlv3. Select Rename a Logical Volume (see Figure 3-14 on page 100). Chapter 3. High availability cluster implementation 99
    • Change a Logical Volume Move cursor to desired item and press Enter. Change a Logical Volume Rename a Logical Volume F1=Help F2=Refresh F3=Cancel Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-14 Changing a Logical Volume menu 4. Select or type the current logical volume name, and enter the new logical volume name. In our example, we use lv06 for the current name, and lvtiv1 for the new name (see Figure 3-15 on page 101).100 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Rename a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * CURRENT logical volume name [lv06] + * NEW logical volume name [lvtiv1] F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-15 Renaming a logical volume 5. Perform the steps 1 through 4 for the logical log volume. We specified the name of the current logical log volume name loglv00 and the new logical volume name as lvtiv1_log. 6. Verify that the logical volume name has been changed successfully by entering the following command. # lsvg -l tiv_vg1 Example 3-4 shows the command output. Example 3-4 Command output of lsvg # lsvg -l tiv_vg1 tiv_vg1: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT lvtws1_log jfslog 1 2 2 open/syncd N/A lvtws1 jfs 512 1024 2 open/syncd /usr/maestro 7. After renaming the logical volume and the logical log volume, check the entry for the file system in the /etc/filesystems file. Make sure the attributes dev and log reflect the change. The value for dev should be the new name for the Chapter 3. High availability cluster implementation 101
    • logical volume, while the value for log should be the name of the jfs log volume. If the log attributes do not reflect the change, issue the following command. (We used /dev/lvtws1_log in our example.) # chfs -a log=/dev/lvtws1_log /usr/maestro Example 3-5 shows how the entry for the file system should look in the /etc/filesystems file. Notice that the value for attribute dev is the new logical volume name(/dev/lvtws1), and the value for attribute log is the new logical log volume name (/dev/lvtws1_log). Example 3-5 An entry in the /etc/filesystems file /usr/maestro: dev = /dev/lvtws1 vfs = jfs log = /dev/lvtws1_log mount = false options = rw account = false Importing the volume groups At this point, you should have a volume group and a file system defined on one node. The next step is to set up the volume group and the file system so that the both nodes are able to access them. We do this by importing the volume groups from the source node to destination node. In our scenario, we import volume group tiv_vg1 to tivaix2. The following steps describe how to import a volume group from one node to another. In these steps we refer to tivaix1 as the source server, and tivaix2 as the destination server. 1. Log in to the source server. 2. Check the physical volume name and the physical volume ID of the disk in which your volume group reside. In Example 3-6 on page 103, notice that the first column indicates the physical volume name, and the second column indicates the physical volume ID. The third column shows which volume group resides on each physical volume. Check the physical volume ID (shown in the second column) for the physical volumes related to your volume group, as this information is required in the following steps to come. # lspv Example 3-6 on page 103 shows example output from tivaix1. You can see that volume group tiv_vg1 resides on hdisk6 and hdisk7.102 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Example 3-6 Output of an lspv command# lspvhdisk0 0001813fe67712b5 rootvg activehdisk1 0001813f1a43a54d rootvg activehdisk2 0001813f95b1b360 rootvg activehdisk3 0001813fc5966b71 rootvg activehdisk4 0001813fc5c48c43 Nonehdisk5 0001813fc5c48d8c Nonehdisk6 000900066116088b tiv_vg1 activehdisk7 000000000348a3d6 tiv_vg1 activehdisk8 00000000034d224b Nonehdisk9 none Nonehdisk10 none Nonehdisk11 none Nonehdisk12 00000000034d7fad None3. Vary off tiv_vg1 from the source node: # varyoffvg tiv_vg14. Log into the destination node as root.5. Check the physical volume name and ID on the destination node. Look for the same physical volume ID that you identified in step 2.). Example 3-7 shows output of the lspv command run on node tivaix2. Note that hdisk5 has the same physical volume id as hdisk6 on tivaix1, and hdisk6 has the same physical volume ID as hdisk7 on tivaix1. # lspvExample 3-7 Output of lspv on node tivaix2# lspvhdisk0 0001814f62b2a74b rootvg activehdisk1 none Nonehdisk2 none Nonehdisk3 none Nonehdisk4 none Nonehdisk5 000900066116088b Nonehdisk6 000000000348a3d6 None1hdisk7 00000000034d224b tiv_vg2 activehdisk16 0001814fe8d10853 Nonehdisk17 none Nonehdisk18 none Nonehdisk19 none Nonehdisk20 00000000034d7fad tiv_vg2 active Chapter 3. High availability cluster implementation 103
    • Importing volume groups To import a volume group, enter the following command. This will take you to the Import a Volume Group screen. # smitty importvg 1. Specify the following values. VOLUME GROUP name: tiv_vg1 PHYSICAL VOLUME name:hdisk5 Volume Group MAJOR NUMBER: 45 Note: The physical volume name has to be the one with the same physical disk id that the importing volume group resides on. Also, note that the value for Volume Group MAJOR NUMBER should be the same value as specified when creating the volume group. Our selections are shown in Figure 3-16 on page 105.104 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Import a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] VOLUME GROUP name [tiv_vg1] * PHYSICAL VOLUME name [hdisk6] + Volume Group MAJOR NUMBER [45] +# F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-16 Import a Volume Group 2. Use the following command to verify that the volume group is imported on the destination node. # lsvg -o Example 3-8 shows the command output on the destination node. Note that tiv_vg1 is now varied on to tivaix2 and is available. Example 3-8 lsvg -o output # lsvg -o tiv_vg1 rootvg Note: By default, the imported volume group is set to be varied on automatically at system restart. In an HACMP cluster, the HACMP software varies on the volume group. We need to change the property of the volume group so that it will not be automatically varied on at system restart. Chapter 3. High availability cluster implementation 105
    • 3. Enter the following command. # smitty chvg 4. Select the volume group imported in the previous step. In our example, we use tiv_vg1 (Figure 3-17). Change a Volume Group Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] * VOLUME GROUP name [tiv_vg1] + F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-17 Changing a Volume Group screen 5. Specify the following, as seen in Figure 3-18 on page 107. Activate volume group AUTOMATICALLY at system restart: no106 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Change a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * VOLUME GROUP name tiv_vg1 * Activate volume group AUTOMATICALLY no + at system restart? * A QUORUM of disks required to keep the volume yes + group on-line ? Convert this VG to Concurrent Capable? no + Change to big VG format? no + LTG Size in kbytes 128 + Set hotspare characteristics n + Set synchronization characteristics of stale n + partitions F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-18 Changing the properties of a volume group Note: At this point, you should now have shared resources defined on one of the nodes. Perform steps “Defining the file systems” through “Testing volume group migrations” to define another set of shared resources that reside on the other node. Testing volume group migrations You should manually test the migration of volume groups between cluster nodes before installing HACMP, to ensure each cluster node can use every volume group. To test volume group migrations in our environment: 1. Log on to tivaix1 as root user. 2. Ensure all volume groups are available. Run the command lsvg. You should see local volume group(s) like rootvg, and all shared volume groups. In our environment, we see the shared volume groups tiv_vg1 and tiv_vg2 from the SSA disk subsystem, as shown in Example 3-9 on page 108. Chapter 3. High availability cluster implementation 107
    • Example 3-9 Verifying all shared volume groups are available on a cluster node [root@tivaix1:/home/root] lsvg rootvg tiv_vg1 tiv_vg2 3. While all shared volume groups are available, they should not be online. Use the following command to verify that no shared volume groups are online: lsvg -o In our environment, the output from the command, as shown in Example 3-10, indicates only the local volume group rootvg is online. Example 3-10 Verifying no shared volume groups are online on a cluster node [root@tivaix1:/home/root] lsvg -o rootvg If you do see shared volume groups listed, vary them offline by running the command: varyoffvg volume_group_name where volume_group_name is the name of the volume group. 4. Vary on all available shared volume groups. Run the command: varyonvg volume_group_name where volume_group_name is the name of the volume group, for each shared volume group. Example 3-11 shows how we varied on all shared volume groups. Example 3-11 How to vary on all shared volume groups on a cluster node [root@tivaix1:/home/root] varyonvg tiv_vg1 [root@tivaix1:/home/root] lsvg -o tiv_vg1 rootvg [root@tivaix1:/home/root] varyonvg tiv_vg2 [root@tivaix1:/home/root] lsvg -o tiv_vg2 tiv_vg1 rootvg Note how we used the lsvg command to verify at each step that the vary on operation succeeded. 5. Determine the corresponding logical volume(s) for each shared volume group varied on.108 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Use the following command to list the logical volume(s) of each volume group: lsvg -l volume_group_name where volume_group_name is the name of a shared volume group. As shown in Example 3-12, in our environment shared volume group tiv_vg1 has two logical volumes, lvtws1_log and lvtws1, and shared volume group tiv_vg2 has logical volumes lvtws2_log and lvtws2.Example 3-12 Logical volumes in each shared volume group varied on in a cluster node[root@tivaix1:/home/root] lsvg -l tiv_vg1tiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTlvtws1_log jfslog 1 2 2 closed/syncd N/Alvtws1 jfs 512 1024 2 closed/syncd /usr/maestro[root@tivaix1:/home/root] lsvg -l tiv_vg2tiv_vg2:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTlvtws2_log jfslog 1 2 2 closed/syncd N/Alvtws2 jfs 128 256 2 closed/syncd /usr/maestro26. Mount the corresponding JFS logical volume(s) for each shared volume group. Use the mount command to mount each JFS logical volume to its defined mount point. Example 3-13 shows how we mounted the JFS logical volumes in our environment.Example 3-13 Mounts of logical volumes on shared volume groups on a cluster node[root@tivaix1:/home/root] df /usr/maestroFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd2 2523136 148832 95% 51330 9% /usr[root@tivaix1:/home/root] mount /usr/maestro[root@tivaix1:/home/root] df /usr/maestroFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/lvtws1 2097152 1871112 11% 1439 1% /usr/maestro[root@tivaix1:/home/root] df /usr/maestro2Filesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd2 2523136 148832 95% 51330 9% /usr[root@tivaix1:/home/root] mount /usr/maestro2[root@tivaix1:/home/root] df /usr/maestro2Filesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/lvtws1 524288 350484 34% 1437 2% /usr/maestro2 Note how we use the df command to verify that the mount point before the mount command is in one file system, and after the mount command is attached to a different filesystem. The different file systems before and after the mount commands are highlighted in bold in Example 3-13. Chapter 3. High availability cluster implementation 109
    • 7. Unmount each logical volume on each shared volume group. Example 3-14 shows how we unmount all logical volumes from all shared volume groups. Example 3-14 Unmount logical volumes on shared volume groups on a cluster node [root@tivaix1:/home/root] umount /usr/maestro [root@tivaix1:/home/root] df /usr/maestro Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd2 2523136 148832 95% 51330 9% /usr [root@tivaix1:/home/root] umount /usr/maestro2 [root@tivaix1:/home/root] df /usr/maestro2 Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd2 2523136 148832 95% 51330 9% /usr Again, note how we use the df command to verify a logical volume is unmounted from a shared volume group. d. Vary off each shared volume group on the cluster node. Use the following command to vary off a shared volume group: varyoffvg volume_group_name where volume_group_name is the name of the volume group, for each shared volume group. The following example shows how we vary off the shared volume groups tiv_vg1 and tiv_vg2: Example 3-15 How to vary off shared volume groups on a cluster node [root@tivaix1:/home/root] varyoffvg tiv_vg1 [root@tivaix1:/home/root] lsvg -o tiv_vg2 rootvg [root@tivaix1:/home/root] varyoffvg tiv_vg2 [root@tivaix1:/home/root] lsvg -o rootvg Note how we use the lsvg command to verify that a shared volume group is varied off. 8. Repeat this procedure for the remaining cluster nodes. You must test that all volume groups and logical volumes can be accessed through the appropriate varyonvg and mount commands on each cluster node. You now know that if volume groups fail to migrate between cluster nodes after installing HACMP, then there is likely a problem with HACMP and not with the configuration of the volume groups themselves on the cluster nodes.110 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Configure Network AdaptersNetwork Adapters should be configured prior to installing HACMP. Important: When configuring Network Adapters, bind only the boot IP address to each network adapter. No configuration for service IP address and persistent IP address is needed at this point. Do not bind a service or persistent IP address to any adapters. A service and persistent IP address is configured after HACMP is installed.1. Log in as root on the cluster node.2. Enter the following command. This command will take you to the SMIT TCP/IP menu. # smitty tcpip3. From the TCP/IP menu, select Minimum Configuration & Startup (Figure 3-19 on page 112). You are prompted to select a network interface from the Available Network Interface list. Select the network interface you want to configure. Chapter 3. High availability cluster implementation 111
    • TCP/IP Move cursor to desired item and press Enter. Minimum Configuration & Startup Further Configuration Use DHCP for TCPIP Configuration & Startup IPV6 Configuration Quality of Service Configuration & Startup F1=Help F2=Refresh F3=Cancel Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-19 The TCP/IP SMIT menu 4. For the network interface you have selected, specify the following items and press Enter. Figure 3-20 on page 113 shows the configuration for our cluster. HOSTNAME Hostname for the node. Internet ADDRESS Enter the IP address for the adapter. This must be the boot address that you planned for the adapter. Network MASK Enter the network mask. NAME SERVER Enter the IP address and the domain name of your name server. Default Gateway Enter the IP address of the default Gateway.112 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Minimum Configuration & Startup To Delete existing configuration data, please use Further Configuration menus Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] [Entry Fields] * HOSTNAME [tivaix1] * Internet ADDRESS (dotted decimal) [192.168.100.101] Network MASK (dotted decimal) [255.255.254.0] * Network INTERFACE en0 NAMESERVER Internet ADDRESS (dotted decimal) [9.3.4.2] DOMAIN Name [itsc.austin.ibm.com] Default Gateway Address (dotted decimal or symbolic name) [9.3.4.41] Cost [0] # Do Active Dead Gateway Detection? no + [MORE...2] F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 3-20 Configuring network adapters 5. Repeat steps 1 through 4 for all network adapters in the cluster. Attention: To implement an HA cluster for IBM Tivoli Workload Scheduler, install IBM Tivoli Workload Scheduler before proceeding to 3.2.4, “Installing HACMP 5.1 on AIX 5.2” on page 92. For instructions on installing IBM Tivoli Workload Scheduler in an HA cluster environment, refer to 4.1, “Implementing IBM Tivoli Workload Scheduler in an HACMP cluster” on page 184. Install HACMP The best results when installing HACMP are obtained if you plan the procedure before attempting it. We recommend that you read through the following installation procedures before undertaking them. If you make a mistake, uninstall HACMP; refer to “Remove HACMP” on page 134. Chapter 3. High availability cluster implementation 113
    • Tip: Install HACMP after all application servers are installed, configured, and verified operational. This simplifies troubleshooting because if the application server does not run after HACMP is installed, you know that addressing an HACMP issue will fix the error. You will not have to spend time identifying whether the problem is with your application or HACMP. The major steps to install HACMP are covered in the following sections: “Preparation” on page 114 “Install base HACMP 5.1” on page 122 “Update HACMP 5.1” on page 126 (Optional, use only if installation or configuration fails) “Remove HACMP” on page 134 The details of each step follow. Preparation By now you should have all the requirements fulfilled and all the preparation completed. In this section, we provide a step-by-step description of how to install HACMP Version 5.1 on AIX Version 5.2. Installation procedures may differ depending on which version of HACMP software you use. For versions other than 5.1, refer to the installation guide for the HACMP version that you install. Ensure that you are running AIX 5.2 Maintenance Level 02. To verify your current level of AIX, run the oslevel and lslpp commands, as shown in Example 3-16. Example 3-16 Verifying the currently installed maintenance level of AIX 5.2 [root@tivaix1:/home/root] oslevel -r 5200-02 [root@tivaix1:/home/root] lslpp -l bos.rte.commands Fileset Level State Description ---------------------------------------------------------------------------- Path: /usr/lib/objrepos bos.rte.commands 5.2.0.12 COMMITTED Commands Path: /etc/objrepos bos.rte.commands 5.2.0.0 COMMITTED Commands If you need to upgrade your version of AIX 5.2, visit the IBM Fix Central Web site: http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp Be sure to upgrade from AIX 5.2.0.0 to Maintenance Level 01 first, then to Maintenance Level 02.114 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Figure 3-21 shows the IBM Fix Central Web page, and the settings you use to select the Web page with AIX 5.2 maintenance packages. (We show the entire Web page in Figure 3-21, but following figures omit the banners in the left-hand, upper, and bottom portions of the page.)Figure 3-21 IBM Fix Central Web page for downloading AIX 5.2 maintenance packages At the time of writing, Maintenance Level 02 is the latest available. We recommend that if you are currently running AIX Version 5.2, you upgrade to Maintenance Level 02. Maintenance Level 01 can be downloaded from: https://techsupport.services.ibm.com/server/mlfixes/52/01/00to01.html Maintenance Level 02 can be downloaded from: https://techsupport.services.ibm.com/server/mlfixes/52/02/01to02.html Chapter 3. High availability cluster implementation 115
    • Note: Check the IBM Fix Central Web site before applying any maintenance packages. After you ensure the AIX prerequisites are satisfied, you may prepare HACMP 5.1 installation media. To prepare HACMP 5.1 installation media on a cluster node, follow these steps: 1. Copy the HACMP 5.1 media to the hard disk on the node. We used /tmp/hacmp on our nodes to hold the HACMP 5.1 media. 2. Copy the latest fixes for HACMP 5.1 to the hard disk on the node. We used /tmp/hacmp on our nodes to hold the HACMP 5.1 fixes. 3. If you do not have the latest fixes for HACMP 5.1, download them from the IBM Fix Central Web site: http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp 4. From this Web page, select pSeries, RS/6000 for the Server pop-up, AIX OS, Java™, compilers for the Product or fix type pop-up, Specific fixes for the Option pop-up, and AIX 5.2 for the OS level pop-up, then press Continue, as shown in Figure 3-22 on page 117.116 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Figure 3-22 Using the IBM Fix Central Web page for downloading HACMP 5.1 patches5. The Select fixes Web page is displayed, as shown in Figure 3-23 on page 118. We use this page to search for and download the fixes for APAR IY45695 and also the following PTF numbers: U496114, U496115, U496116, U496117, U496118, U496119, U496120, U496121, U496122, U496123, U496124, U496125, U496126, U496127, U496128, U496129, U496130, U496138, U496274, U496275 We used /tmp/hacmp_fixes1 for storing the fix downloads of APAR IY45695, and /tmp/hacmp_fixes2 for storing the fix downloads of the individual PTFs. Chapter 3. High availability cluster implementation 117
    • Figure 3-23 Select fixes page of IBM Fix Central Web site 6. To download the fixes for APAR IY45695, select APAR number or abstract for the Search by pop-up, enter IY45695 in the Search string field, and press Go. A browser dialog as shown in Figure 3-24 may appear, depending upon previous actions within IBM Fix Central. If it does appear, press OK to continue (Figure 3-24). Figure 3-24 Confirmation dialog presented in IBM Fix Central Select fixes page 7. The Select fixes page displays the fixes found, as shown in Figure 3-25 on page 119.118 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Figure 3-25 Select fixes page showing fixes found that match APAR IY456958. Highlight the APAR in the list box, then press the Add to my download list link. Press Continue, which displays the Packaging options page.9. Select AIX 5200-01 for the Indicate your current maintenance level pop-up. At the time of writing, the only available download servers are in North America, so selecting a download server is an optional step. Select a download server if a more appropriate server is available in the pop-up. Now press Continue, as shown in Figure 3-26 on page 120. Chapter 3. High availability cluster implementation 119
    • Figure 3-26 Packaging options page for packaging fixes for APAR IY45695 10.The Download fixes page is displayed as shown in Figure 3-27 on page 121. Choose an appropriate option from the Download and delivery options section of the page, then follow the instructions given to download the fixes.120 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Figure 3-27 Download fixes page for fixes related to APAR IY45695 11.Downloading fixes for PTFs follows the same procedure as for downloading the fixes for APAR IY45695, except you select Fileset or PTF number in the Search by pop-up in the Select fixes Web page. Chapter 3. High availability cluster implementation 121
    • 12.Copy the installation media to each cluster node or make it available via a remote filesystem like NFS, AFS®, or DFS™. Install base HACMP 5.1 After the installation media is prepared on a cluster node, install the base HACMP 5.1 Licensed Program Products (LPPs): 1. Enter the command smitty install to start installing the software. The Software Installation and Maintenance SMIT panel is displayed as in Figure 3-28. Software Installation and Maintenance Move cursor to desired item and press Enter. Install and Update Software List Software and Related Information Software Maintenance and Utilities Software Service Management Network Installation Management EZ NIM (Easy NIM Tool) System Backup Manager F1=Help F2=Refresh F3=Cancel F8=Image F9=Shell F10=Exit Enter=DoFigure 3-28 Screen displayed after running command smitty install 2. Go to Install and Update Software > Install Software and press Enter. This brings up the Install Software SMIT panel (Figure 3-29 on page 123).122 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Install Software Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software [/tmp/hacmp] + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-29 Filling out the INPUT device/directory for software field in the Install Software smit panel 3. Enter the directory that the HACMP 5.1 software is stored under into the INPUT device / directory for software field and press Enter, as shown in Figure 3-29. In our environment we entered the directory /tmp/hacmp into the field and pressed Enter. This displays the Install Software SMIT panel with all the installation options (Figure 3-30 on page 124). Chapter 3. High availability cluster implementation 123
    • Install Software Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software /tmp/hacmp * SOFTWARE to install [_all_latest] + PREVIEW only? (install operation will NOT occur) no + COMMIT software updates? yes + SAVE replaced files? no + AUTOMATICALLY install requisite software? yes + EXTEND file systems if space needed? yes + OVERWRITE same or newer versions? no + VERIFY install and check file sizes? no + Include corresponding LANGUAGE filesets? yes + DETAILED output? no + Process multiple volumes? yes + ACCEPT new license agreements? no + Preview new LICENSE agreements? no + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-30 Install Software SMIT panel with all installation options 4. Press Enter to install all HACMP 5.1 Licensed Program Products (LPPs) in the selected directory. 5. SMIT displays an installation confirmation dialog as shown in Figure 3-31 on page 125. Press Enter to continue. The COMMAND STATUS SMIT panel is displayed. Throughout the rest of this redbook, if a SMIT confirmation dialog is displayed it is assumed you will know how to respond to it, so we do not show this step again.124 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • +--------------------------------------------------------------------------+ ¦ ARE YOU SURE? ¦ ¦ ¦ ¦ Continuing may delete information you may want ¦ ¦ to keep. This is your last chance to stop ¦ ¦ before continuing. ¦ ¦ Press Enter to continue. ¦ ¦ Press Cancel to return to the application. ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ +--------------------------------------------------------------------------+ Figure 3-31 Installation confirmation dialog for SMIT 6. The COMMAND STATUS SMIT panel displays the progress of the installation. Installation will take several minutes, depending upon the speed of your machine. When the installation completes, the panel looks similar to Figure 3-32. COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. [TOP] geninstall -I "a -cgNpQqwX -J" -Z -d /usr/sys/inst.images/hacmp/hacmp_510 -f Fi le 2>&1 File: I:cluster.hativoli.client 5.1.0.0 I:cluster.hativoli.server 5.1.0.0 I:cluster.haview.client 4.5.0.0 I:cluster.haview.server 4.5.0.0 ******************************************************************************* [MORE...90] F1=Help F2=Refresh F3=Cancel F6=Command F8=Image F9=Shell F10=Exit /=Find n=Find NextFigure 3-32 COMMAND STATUS SMIT panel showing successful installation of HACMP 5.1 Chapter 3. High availability cluster implementation 125
    • Update HACMP 5.1 After installing the base HACMP 5.1 Licensed Program Products (LPPs), you must upgrade it to the latest fixes available. To update HACMP 5.1: 1. Enter the command smitty update to start updating HACMP 5.1. The Update Software by Fix (APAR) SMIT panel is displayed as shown in Figure 3-33. Update Software by Fix (APAR) Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software [] + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-33 Update Software by Fix (APAR) SMIT panel displayed by running command smitty update 2. Enter in the INPUT device / directory for software field the directory that you used to store the fixes for APAR IY45695, then press Enter. We used /tmp/hacmp_fixes1 in our environment, as shown in Figure 3-35 on page 128.126 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Update Software by Fix (APAR) Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software [/tmp/hacmp_fixes1] + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-34 Entering directory of APAR IY45695 fixes into Update Software by Fix (APAR) SMIT panel 3. The Update Software by Fix (APAR) SMIT panel is displayed with all the update options. Move the cursor to the FIXES to install item as shown in Figure 3-35 on page 128, and press F4 (or Esc 4) to select the HACMP fixes to update. Chapter 3. High availability cluster implementation 127
    • Update Software by Fix (APAR) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software /tmp/hacmp_fixes1 * FIXES to install [] + PREVIEW only? (update operation will NOT occur) no + COMMIT software updates? yes + SAVE replaced files? no + EXTEND file systems if space needed? yes + VERIFY install and check file sizes? no + DETAILED output? no + Process multiple volumes? yes + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-35 Preparing to select fixes for APAR IY45695 in Update Software by Fix (APAR) SMIT panel 4. The FIXES to install SMIT dialog is displayed as in Figure 3-36 on page 129. This lists all the fixes for APAR IY45695 that can be applied.128 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • +--------------------------------------------------------------------------+ ¦ FIXES to install ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. Use arrow keys to scroll. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ [TOP] ¦ ¦ IY45538 - ENH: Updated Online Planning Worksheets for HACMP R510 ¦ ¦ IY45539 - ENH: clrgmove support of replicated resources ¦ ¦ IY47464 UPDATE WILL PUT IN TWO NAME_SERVER STANZAS ¦ ¦ IY47503 HAES,HAS: BROADCAST ROUTES EXIST ON LO0 INTERFACE AFTER ¦ ¦ IY47577 WITH TCB ACTIVE, MANY MSG 3001-092 IN HACMP.OUT DURING SYNCLVO ¦ ¦ IY47610 HAES: FAILURE TO UMOUNT EXPORTED FILESYSTEM WITH DEVICE BUSY - ¦ ¦ IY47777 IF ONE NODE UPGRADED TO HAES 5.1 SMIT START CLUSTER SERVICES ¦ ¦ IY48184 Fixes for Multiple Site Clusters ¦ ¦ [MORE...36] ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦ ¦ Enter=Do /=Find n=Find Next ¦ +--------------------------------------------------------------------------+Figure 3-36 Selecting fixes for APAR IY45695 in FIXES to install SMIT dialog5. Select all fixes in the dialog by pressing F7 (or Esc 7) on each line so that a selection symbol (>) is added in front of each line as shown in Figure 3-37 on page 130. Press Enter after all fixes are selected. Chapter 3. High availability cluster implementation 129
    • +--------------------------------------------------------------------------+ ¦ FIXES to install ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. Use arrow keys to scroll. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ [MORE...36] ¦ ¦ > IY48918 CSPOC:Add a Shared FS gives error in cspoc.log ¦ ¦ > IY48922 CSPOC:disk replacement does not work ¦ ¦ > IY48926 incorrect version info on node_up ¦ ¦ > IY49152 cluster synch changes NW attribute from private to public ¦ ¦ > IY49490 ENH: relax clverify check for nodes in fast connect mt rg ¦ ¦ > IY49495 clstrmgr has memory leaks ¦ ¦ > IY49497 ENH: Need option to leave log files out of cluster snapshot ¦ ¦ > IY49498 Verification dialogs use inconsistent terminology. ¦ ¦ [BOTTOM] ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦ ¦ Enter=Do /=Find n=Find Next ¦ +--------------------------------------------------------------------------+ Figure 3-37 Selecting all fixes of APAR IY45695 in FIXES to install SMIT dialog 6. The Update Software by Fix (APAR) SMIT panel is displayed again (Figure 3-38 on page 131), showing all the selected fixes from the FIXES to install SMIT dialog in the FIXES to install field. Press Enter to begin applying all fixes of APAR IY45695.130 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Update Software by Fix (APAR) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software /tmp/hacmp_fixes1 * FIXES to install [IY45538 IY45539 IY474> + PREVIEW only? (update operation will NOT occur) no + COMMIT software updates? yes + SAVE replaced files? no + EXTEND file systems if space needed? yes + VERIFY install and check file sizes? no + DETAILED output? no + Process multiple volumes? yes + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-38 Applying all fixes of APAR IY45695 in Update Software by Fix (APAR) SMIT panel 7. The COMMAND STATUS SMIT panel is displayed. It shows the progress of the selected fixes for APAR IY45695 applied to the system. A successful update will appear similar to Figure 3-39 on page 132. Chapter 3. High availability cluster implementation 131
    • COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. [TOP] instfix -d /usr/sys/inst.images/hacmp/hacmp_510_fixes -f /tmp/.instfix_selection s.12882 > File installp -acgNpqXd /usr/sys/inst.images/hacmp/hacmp_510_fixes -f File File: cluster.adt.es.client.include 05.01.0000.0002 cluster.adt.es.client.samples.clinfo 05.01.0000.0002 cluster.adt.es.client.samples.clstat 05.01.0000.0001 cluster.adt.es.client.samples.libcl 05.01.0000.0001 cluster.es.client.lib 05.01.0000.0002 cluster.es.client.rte 05.01.0000.0002 [MORE...67] F1=Help F2=Refresh F3=Cancel F6=Command F8=Image F9=Shell F10=Exit /=Find n=Find NextFigure 3-39 COMMAND STATUS SMIT panel showing all fixes of APAR IY45695 successfully applied 8. Confirm that the fixes were installed by first exiting SMIT. Press F10 (or Esc 0) to exit SMIT. Then enter the following command: lslpp -l "cluster.*" The output should be similar to that shown in Example 3-17. Note that some of the Licensed Program Products (LPPs) show a version other than the 5.1.0.0 base version of HACMP. This confirms that the fixes were successfully installed.Example 3-17 Confirming installation of fixes for APAR IY45695 [root@tivaix1:/home/root]lslpp -l "cluster.*" Fileset Level State Description ---------------------------------------------------------------------------- Path: /usr/lib/objrepos cluster.adt.es.client.demos 5.1.0.0 COMMITTED ES Client Demos cluster.adt.es.client.include 5.1.0.2 COMMITTED ES Client Include Files cluster.adt.es.client.samples.clinfo132 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 5.1.0.2 COMMITTED ES Client CLINFO Samples cluster.adt.es.client.samples.clstat 5.1.0.1 COMMITTED ES Client Clstat Samples cluster.adt.es.client.samples.demos 5.1.0.0 COMMITTED ES Client Demos Samples cluster.adt.es.client.samples.libcl 5.1.0.1 COMMITTED ES Client LIBCL Samples cluster.adt.es.java.demo.monitor 5.1.0.0 COMMITTED ES Web Based Monitor Demo cluster.adt.es.server.demos 5.1.0.0 COMMITTED ES Server Demos cluster.adt.es.server.samples.demos 5.1.0.1 COMMITTED ES Server Sample Demos cluster.adt.es.server.samples.images 5.1.0.0 COMMITTED ES Server Sample Images cluster.doc.en_US.es.html 5.1.0.1 COMMITTED HAES Web-based HTML Documentation - U.S. English cluster.doc.en_US.es.pdf 5.1.0.1 COMMITTED HAES PDF Documentation - U.S. English cluster.es.cfs.rte 5.1.0.1 COMMITTED ES Cluster File System Support cluster.es.client.lib 5.1.0.2 COMMITTED ES Client Libraries cluster.es.client.rte 5.1.0.2 COMMITTED ES Client Runtime cluster.es.client.utils 5.1.0.2 COMMITTED ES Client Utilities cluster.es.clvm.rte 5.1.0.0 COMMITTED ES for AIX Concurrent Access cluster.es.cspoc.cmds 5.1.0.2 COMMITTED ES CSPOC Commands cluster.es.cspoc.dsh 5.1.0.0 COMMITTED ES CSPOC dsh cluster.es.cspoc.rte 5.1.0.2 COMMITTED ES CSPOC Runtime Commands cluster.es.plugins.dhcp 5.1.0.1 COMMITTED ES Plugins - dhcp cluster.es.plugins.dns 5.1.0.1 COMMITTED ES Plugins - Name Server cluster.es.plugins.printserver 5.1.0.1 COMMITTED ES Plugins - Print Server cluster.es.server.diag 5.1.0.2 COMMITTED ES Server Diags cluster.es.server.events 5.1.0.2 COMMITTED ES Server Events cluster.es.server.rte 5.1.0.2 COMMITTED ES Base Server Runtime cluster.es.server.utils 5.1.0.2 COMMITTED ES Server Utilitiescluster.es.worksheets 5.1.0.2 COMMITTED Online Planning Worksheets cluster.license 5.1.0.0 COMMITTED HACMP Electronic License cluster.msg.en_US.cspoc 5.1.0.0 COMMITTED HACMP CSPOC Messages - U.S. English cluster.msg.en_US.es.client 5.1.0.0 COMMITTED ES Client Messages - U.S. English cluster.msg.en_US.es.server 5.1.0.0 COMMITTED ES Recovery Driver Messages - U.S. EnglishPath: /etc/objrepos cluster.es.client.rte 5.1.0.0 COMMITTED ES Client Runtime cluster.es.clvm.rte 5.1.0.0 COMMITTED ES for AIX Concurrent Access Chapter 3. High availability cluster implementation 133
    • cluster.es.server.diag 5.1.0.0 COMMITTED ES Server Diags cluster.es.server.events 5.1.0.0 COMMITTED ES Server Events cluster.es.server.rte 5.1.0.2 COMMITTED ES Base Server Runtime cluster.es.server.utils 5.1.0.0 COMMITTED ES Server Utilities Path: /usr/share/lib/objrepos cluster.man.en_US.es.data 5.1.0.2 COMMITTED ES Man Pages - U.S. English 9. Repeat this procedure for each node in the cluster to install the LPPs for APAR IY45695. 10.Repeat this entire procedure for all the fixes corresponding to the preceding PTFs. Enter the directory path these fixes are stored in into the INPUT device / directory for software field referred to by step 2. We used /tmp/hacmp_fixes2 in our environment. Remove HACMP If you make a mistake with the HACMP installation, or if subsequent configuration fails due to Object Data Manager (ODM) errors or another error that prevents successful configuration, you can remove HACMP to recover to a known state. Removing resets all ODM entries, and removes all HACMP files. Re-installing will create new ODM entries, and often solve problems with corrupted HACMP ODM entries. To remove HACMP: 1. Enter the command smitty remove. 2. The Remove Installed Software SMIT panel is displayed. Enter the following text in the SOFTWARE name field: cluster.*, as shown in Figure 3-40 on page 135.134 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Remove Installed Software Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * SOFTWARE name [cluster.*] + PREVIEW only? (remove operation will NOT occur) yes + REMOVE dependent software? no + EXTEND file systems if space needed? no + DETAILED output? no + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-40 How to specify removal of HACMP in Remove Installed Software SMIT panel 3. Move the cursor to the PREVIEW only? (remove operation will NOT occur) field and press Tab to change the value to no, change the EXTEND file systems if space needed? field to yes, and change the DETAILED output field to yes, as shown in Figure 3-41 on page 136. Chapter 3. High availability cluster implementation 135
    • Remove Installed Software Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * SOFTWARE name [cluster.*] + PREVIEW only? (remove operation will NOT occur) no + REMOVE dependent software? no + EXTEND file systems if space needed? yes + DETAILED output? yes + F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFigure 3-41 Set options for removal of HACMP in Installed Software SMIT panel 4. Press Enter to start removal of HACMP. The COMMAND STATUS SMIT panel displays the progress and final status of the removal operation. A successful removal looks similar to Figure 3-42 on page 137.136 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. [TOP] geninstall -u -I "pX -V2 -J -w" -Z -f File 2>&1 File: cluster.* +-----------------------------------------------------------------------------+ Pre-deinstall Verification... +-----------------------------------------------------------------------------+ Verifying selections...done Verifying requisites...done Results... [MORE...134] F1=Help F2=Refresh F3=Cancel F6=Command F8=Image F9=Shell F10=Exit /=Find n=Find NextFigure 3-42 Successful removal of HACMP as shown by COMMAND STATUS SMIT panel 5. Press F10 (or Esc 0) to exit SMIT. When you finish the installation of HACMP, you need to configure it for the application servers you want to make highly available. In this redbook, we show how to do this with IBM Tivoli Workload Scheduler first in 4.1.10, “Configure HACMP for IBM Tivoli Workload Scheduler” on page 210, then IBM Tivoli Management Framework in 4.1.11, “Add IBM Tivoli Management Framework” on page 303. Chapter 3. High availability cluster implementation 137
    • 3.3 Implementing a Microsoft Cluster In this section, we walk you through the installation process for a Microsoft Cluster (also referred to as Microsoft Cluster Service or MSCS throughout the book). We also discuss the hardware and software aspects of MSCS, as well as the installation procedure. The MSCS environment that we create in this chapter is a two-node hot standby cluster. The system will share two external SCSI drives connected to each of the nodes via a Y-cable. Figure 3-43 illustrates the system configuration. Public NIC Private NIC Private NIC Public NIC IP 9.3.4.197 IP 192.168.1.1 IP 192.168.1.2 IP 9.3.4.198 SCSI ID-6 X: SCSI ID-7 SCSI SCSI SCSI ID-5 C: C: Y : & Z: SCSI ID-4 tivw2k1 tivw2k2Figure 3-43 Microsoft Cluster environment The cluster is connected using four Network Interface Cards (NICs). Each node has a private NIC and a public NIC. In an MSCS, the heartbeat connection is referred to as a private connection. The private connection is used for internal cluster communications and is connected between the two nodes using a crossover cable. The public NIC is the adapter that is used by the applications that are running locally on the server, as well as cluster applications that may move between the nodes in the cluster. The operating system running on our nodes is Windows 2000 Advanced Edition with Service Pack 4 installed. In our initial cluster installation, we will set up the default cluster group. Cluster groups in an MSCS environment are logical groups of resources that can be moved from one node to another. The default cluster group that we will set up will138 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • contain the shared drive X: an IP address (192.168.1.197) and a network name (tivw2kv1).3.3.1 Microsoft Cluster hardware considerations When designing a Microsoft Cluster, it is important to make sure all the hardware you would like to use is compatible with the Microsoft Cluster software. To make this easy Microsoft maintains a Hardware Compatibility List (HCL) found at: http://www.microsoft.com/whdc/hcl/search.mspx Check the HCL before you order your hardware to ensure your cluster configuration will be supported.3.3.2 Planning and designing a Microsoft Cluster installation You need to execute some setup tasks before you start installing Microsoft Cluster Service. Following are the requirements for a Microsoft Cluster: Configure the Network Interface Cards (NICs) Each node in the cluster will need two NICs: one for public communications, and one for private cluster communications. The NICs will have to be configured with static IP addresses.Table 3-14 shows our configuration. Table 3-14 NIC IP addresses Node IP tivw2k1 (public) 9.3.4.197 tivw2k1 (private) 192.168.1.1 tivw2k2 (public) 9.3.4.198 tivw2k2 (private) 192.168.1.2 Set up the Domain Name System (DNS) Make sure all IP addresses for your NICs, and IP addresses that will be used by the cluster groups, are added to the Domain Name System (DNS). The private NIC IP addresses do not have to be added to DNS. Our configuration will require that the IP addresses and names listed in Table 3-15 on page 140 be added to the DNS. Chapter 3. High availability cluster implementation 139
    • Table 3-15 DNS entries required for the cluster Hostname IP Address tivw2k1 9.3.4.197 tivw2k2 9.3.4.198 tivw2kv1 9.3.4.199 tivw2kv2 9.3.4.175 Set up the shared storage When setting up the shared storage devices, ensure that all drives are partitioned correctly and that they are all formatted with the NT filesystem (NTFS). When setting up the drives, ensure that both nodes are assigned the same driver letters for each partition and are set up as basic drives. We chose to set up our drive letters starting from the end of the alphabet so we would not interfere with any domain login scripts or temporary storage devices. If you are using SCSI drives, ensure that the drives are all using different SCSI IDs and that the drives are terminated correctly. When you partition your drives, ensure you set up a partition specifically for the quorum. The quorum is a partition used by the cluster service to store cluster configuration database checkpoints and log files. The quorum partition needs to be at least 100 MB in size. Important: Microsoft recommends that the quorum partition be on a separate disk and also recommends the partition be 500 MB in size Table 3-16 illustrates how we set up our drives. Table 3-16 Shared drive partition table Disk Drive Letter Size Label Disk 1 X: 34 GB Partition1 DIsk 2 Y: 33.9 GB Partition 2 Z: 100 MB Quorum Note: When configuring the disks, make sure that you configure them on one node at a time and that the node that is not being configured is powered off. If both nodes try to control the disk at the same time, they may cause disk corruption.140 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Update the operating system Before installing the cluster service, connect to the Microsoft Software Update Web site to ensure you have all the latest hardware drivers and software patches installed. The Microsoft Software Update Web site can be found at: http://windowsupdate.microsoft.com Create a domain account for the cluster The cluster service requires that a domain account be created under which the cluster service will run. The domain account must be a member of the administrator group on each of the nodes in the cluster. Make sure you set the account so that the user cannot change the password and that the password never expires. We created the account “cluster_service” for our cluster. Add nodes to the domain The cluster service runs under a domain account. In order for the domain account to be able to authenticate against the domain controller, the nodes must join the domain where the cluster user has been created.3.3.3 Microsoft Cluster Service installation Here we discuss the Microsoft Cluster Service installation process. The installation is broken into three sections: installation of the primary node; installation of the secondary node; and configuration of the cluster resources. Following is a high-level overview of the installation procedure. Detailed information for each step in the process are provided in the following sections. Installation of the MSCS node 1 Important: Before starting the installation on Node 1, make sure that Node 2 is powered off. The cluster service is installed as a Windows component. To install the service, the Windows 2000 Advanced Server CD-ROM should be in CD-ROM drive. You can save time by copying the i386 directory from the CD to the local drive. 1. To start the installation, open the Start menu and select Settings -> Control Panel and then double-click Add/Remove Programs. Chapter 3. High availability cluster implementation 141
    • 2. Click Add/Remove Windows Components, located on the left side of the window. Select Cluster Service from the list of components as shown in Figure 3-44, then click Next. Figure 3-44 Windows Components Wizard142 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 3. Make sure that Remote administration mode is checked and click Next (Figure 3-45). You will be asked to insert the Windows 2000 Advanced Server CD if it is not already inserted. If you copied the CD to the local drive, select the location where it was copied.Figure 3-45 Windows Components Wizard Chapter 3. High availability cluster implementation 143
    • 4. Click Next at the welcome screen (Figure 3-46). Figure 3-46 Welcome screen144 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 5. The next window (Figure 3-47) is used by Microsoft to verify that you are aware that it will not support hardware that is not included in its Hardware Compatibility List (HCL). To move on to the next step of the installation, click I Understand and then click Next.Figure 3-47 Hardware Configuration Chapter 3. High availability cluster implementation 145
    • 6. Now that we have located the installation media and have acknowledged the support agreement we can start the actually installation. The next screen is used to select whether you will be installing the first node or any additional node. We will install the first node in the cluster at this point so make sure that the appropriate radio button is selected and click Next (Figure 3-48). We will return to this screen again later when we install the second node in the cluster. Figure 3-48 Create or Join a Cluster146 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 7. We must now name our cluster. The name is the local name associated with the whole cluster. This is not the virtual name that is associated with the a cluster group. This is used by the Microsoft Cluster Administrator utility to administer the cluster resources. We prefer to use the same as the hostname to prevent confusion. In this case we call it TIVW2KV1. After you have entered a name for your cluster, click Next (Figure 3-49).Figure 3-49 Cluster Name Chapter 3. High availability cluster implementation 147
    • 8. The next step is to enter the domain account that the cluster service will use. See the pre-installation setup section for details on setting up the domain account that the cluster service will use. Click Next (Figure 3-50). Figure 3-50 Select an Account148 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 9. The next window is used to determine the disks that the cluster service will manage. In the example we have two partitions, one for the quorum and another for the data. Make sure both are set up as managed disks. Click Next (Figure 3-51).Figure 3-51 Add or Remove Managed Disks Chapter 3. High availability cluster implementation 149
    • 10.We now need to select where the cluster checkpoint and log files will be stored. This disk is referred to as the Quorum Disk. The quorum is a vital part of the cluster as it used for storing critical cluster files. If the data on the Quorum Disk becomes corrupt, the cluster will be unusable. It is important to back up this data regularly so you will be able to recover your cluster. It is recommended that you have at least 100 MB on a separate partition for reserved for this purpose; refer to the preinstallation setup section on disk preparation. After you select your Quorum Disk, select Next. (Figure 3-52). Figure 3-52 Cluster File Storage150 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 11.The next step is to configure networking. A window will pop up to recommend that you use multiple public adapters to remove any single point of failure. Click Next to continue (Figure 3-53).Figure 3-53 Warning window Chapter 3. High availability cluster implementation 151
    • 12.The next section will prompt you to identify each NIC as either public, private or both. Since we named our adapters ahead of time, this is easy. Set the adapter that is labeled Public Network Connection as Client access only (public network). Click Next (Figure 3-54). Figure 3-54 Network Connections - All communications152 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 13.Now we will configure the private network adapter. This adapter is used as a heartbeat connection between the two nodes of the cluster and is connected via a crossover cable. Since this adapter is not accessible from the public network, this is considered a private connection and should be configured as Internal cluster communications only (private network). Click Next (Figure 3-55).Figure 3-55 Network Connections - Internal cluster communications only (privatenetwork) Chapter 3. High availability cluster implementation 153
    • 14.Because we configured two adapters to be capable of communicating as private adapters, we need to select the priority in which the adapters will communicate. In our case, we want the Private Network Connection to serve as our primary private adapter. We will use the Public Network Connection as our backup adapter. Click Next to continue (Figure 3-56). Figure 3-56 Network priority setup154 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 15.Once the network adapters have been configured, it is time to create the cluster resources. The first cluster resource is the cluster IP address. The cluster IP address is the IP address associated with the cluster resource group; it will follow the resource group when it is moved from node to node. This cluster IP address is commonly referred to as the virtual IP. Set up the cluster IP address you will need to enter the IP address and subnet mask that you plan to use, and select the Public Network Connection as the network to use. Click Next (Figure 3-57). .Figure 3-57 Cluster IP Address Chapter 3. High availability cluster implementation 155
    • 16.Click Finish to complete the cluster service configuration (Figure 3-58). Figure 3-58 Cluster Service Configuration Wizard 17.The next window is just an informational pop-up letting you know that the Cluster Administrator application is now available. The cluster service is managed using the Cluster Administrator tool. Click OK (Figure 3-60 on page 157). Figure 3-59 Cluster Service Configuration Wizard156 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 18.Click Finish one more time to close the installation wizard (Figure 3-60).Figure 3-60 Windows Components WizardAt this point, the installation of the cluster service on the primary node iscompete. Now that we have created a cluster, we will need to add additionalnodes to the cluster.Installing the second nodeThe next step is to install the second node in the cluster. To add the secondnode, you will have to perform the following steps on the secondary node. Theinstallation of the secondary node is relatively easy, since the cluster isconfigured during the installation on the primary node. The first few steps areidentical to installing the cluster service on the primary node.To install the cluster service on the secondary node:1. Go to the Start Menu and select Settings -> Control Panel and double-click Add/Remove Programs. Chapter 3. High availability cluster implementation 157
    • 2. Click Add/Remove Windows Components, located on the left side of the window, and then select Cluster Service from the list of components. Click Next to start the installation (Figure 3-61). Figure 3-61 Windows Components Wizard158 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 3. Make sure the Remote administration mode is selected (Figure 3-62); it should be the only option available. Click Next to continue.Figure 3-62 Windows Components Wizard Chapter 3. High availability cluster implementation 159
    • 4. Click Next past the welcome screen (Figure 3-63). Figure 3-63 Windows Components Wizard160 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 5. Once again you will have to verify that the hardware that you have selected is compatible with the software you are installing and that you understand that Microsoft will not support software not on the HCL. Click I Understand and then Next to continue (Figure 3-64).Figure 3-64 Hardware Configuration Chapter 3. High availability cluster implementation 161
    • 6. The next step is to select that you will be adding the second node to the cluster. Once the second node option is selected, click Next to continue (Figure 3-65). Figure 3-65 Create or Join a Cluster162 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 7. You will now have to type in the name of the cluster that you would like the second node to be a member. Since we set up a domain account to be used for the cluster service, we will not need to check the connect to cluster box. Click Next (Figure 3-66).Figure 3-66 Cluster Name Chapter 3. High availability cluster implementation 163
    • 8. The next window prompts you for a password for the domain account that we installed the primary node with. Enter the password and click Next (Figure 3-67). Figure 3-67 Select an Account164 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 9. Click Finish to complete the installation (Figure 3-68).Figure 3-68 Finish the installation Chapter 3. High availability cluster implementation 165
    • 10.The next step is to verify that the cluster works. To verify that the cluster is operational, we will need to open the Cluster Administrator. You can open the Cluster Administrator in the Start Menu by selecting Programs -> Administrative Tools -> Cluster Administrator (Figure 3-69). You will notice that the cluster will have two groups: one called cluster group, and the other called Disk Group 1: – The cluster group is the group that contains the virtual IP and name and cluster shared disk. – Disk Group 1 at this time only contains our quorum disk. In order to verify that the cluster is functioning properly, we need to move the cluster group from one node to the other. You can move the cluster group by right-clicking the icon and selecting Move Group. After you have done this, you should see the group icon change for a few seconds while the resources are moved to the secondary node. Once the group has been moved, you should see that the icon return to normal and the owner of the group should now be the second node in the cluster. Figure 3-69 Verifying that the cluster works The cluster service is now installed and we are ready to start adding applications to our cluster groups.166 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Configuring the cluster resourcesNow it is time to configure the cluster resources. The default setup using thecluster service installation wizard is not optimal for our Tivoli environment. Forthe scenarios used later in this book, we have to set up the cluster resources fora mutual takeover scenario. To support this, we have to modify the currentresource groups and add two resources. Figure 3-70 illustrates the desiredconfiguration. tivw2k1 tivw2k2 TIVW2KV1 Resource Group Driv e X: IP Address 9.3.4.199 Network Name TIVW2KV1 TIVW2KV2 Resource Group Driv e Y : Z: IP Address 9.3.4.175 Network Name TIVW2KV2Figure 3-70 Cluster resource diagramThe following steps will guide you through the cluster configuration. Chapter 3. High availability cluster implementation 167
    • 1. The first step is to rename the cluster resource groups. a. Right-click the cluster group containing the Y: and Z: drive resource and select Rename (Figure 3-71). Enter the name TIVW2KV1. b. Right-click the cluster group containing the X: drive resource and select Rename. Enter the name TIVW2KV2. Figure 3-71 Rename the cluster resource groups168 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 2. Now we will need to move the disk resources to the correct groups. a. Right-click the Disk Y: Z: resource under the TIVW2KV1 resource group and select Change Group -> TIVW2KV2 as shown in Figure 3-72.Figure 3-72 Changing resource groups b. Press Yes to complete the move (Figure 3-73).Figure 3-73 Resource move confirmation c. Right -lick the Disk X: resource under the TIVW2KV2 resource group and select Change Group -> TIVW2KV1. d. Press Yes to complete the move. Chapter 3. High availability cluster implementation 169
    • 3. The next step is to rename the resources. We do this so we can determine which resource group a resource belongs to by it name. a. Right-click the Cluster IP Address resource under the TIVW2KV1 resource group and select Rename (Figure 3-74). Enter the name TIVW2KV1 - Cluster IP Address. b. Right-click the Cluster Name resource under the TIVW2KV1 resource group and select Rename. Enter the name TIVW2KV1 - Cluster Name. c. Right- click the Disk X: resource under the TIVW2KV1 resource group and select Rename. Enter the name TIVW2KV1 - Disk X:. d. Right-click the Disk Y: Z: resource under the TIVW2KV2 resource group and select Rename. Enter the name TIVW2KV2 - Disk Y: Z:. Figure 3-74 Rename resources170 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 4. We now need to add two resources under the TIVW2KV2 resource group. The first resource we will add is the IP Address resource. a. Right-click the TIVW2KV2 resource group and select New -> Resource (Figure 3-75).Figure 3-75 Add a new resource Chapter 3. High availability cluster implementation 171
    • b. Enter TIVW2KV2 - IP Address in the name field and set the resource type to IP address. Click Next (Figure 3-76). Figure 3-76 Name resource and select resource type172 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • c. Select both TIVW2K1 and TIVW2K2 and possible owners of the resource. Click Next (Figure 3-77).Figure 3-77 Select resource owners Chapter 3. High availability cluster implementation 173
    • d. Click Next past the dependencies screen; no dependencies need to be defined at this time (Figure 3-78). Figure 3-78 Dependency configuration174 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • e. The next step is to configure the IP address associated with the resource. Enter the IP address 9.3.4.175 in the Address field and add the subnet mask of 255.255.255.254. Make sure the Public Network Connection is selected in the Network field and the Enable NetBIOS for this address box is checked. Click Next (Figure 3-79).Figure 3-79 Configure IP address f. Click OK to complete the installation (Figure 3-80).Figure 3-80 Completion dialog Chapter 3. High availability cluster implementation 175
    • 5. Now that the IP address resource has been created, we need to create the Name resource for the TIVW2KV2 cluster group. a. Right-click the TIVW2KV2 resource group and select New -> Resource (Figure 3-81). Figure 3-81 Adding a new resource176 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • b. Set the name of the resource to TIVW2KV2 - Cluster Name and specify the resource type to be Network Name. Click Next (Figure 3-82).Figure 3-82 Specify resource name and type Chapter 3. High availability cluster implementation 177
    • c. Next select both TIVW2K1 and TIVW2K2 as possible owners of the resource. Click Next (Figure 3-83). Figure 3-83 Select resource owners178 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • d. Click Next in the Dependencies screen (Figure 3-84). We do not need to configure these at this time.Figure 3-84 Resource dependency configuration Chapter 3. High availability cluster implementation 179
    • e. Next we will enter the cluster name for the TIVW2KV2 resource group. Enter the cluster name TIVW2KV2 in the Name field. Click Next (Figure 3-85). Figure 3-85 Cluster name f. Click OK to complete the cluster name configuration (Figure 3-86). Figure 3-86 Completion dialog180 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 6. The final step of the cluster configuration is to bring the TIVW2KV2 resource group online. To do this, right-click the TIVW2KV2 resource group and select Bring Online (Figure 3-87).Figure 3-87 Bring resource group onlineThis concludes our cluster configuration. Chapter 3. High availability cluster implementation 181
    • 182 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 4 Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster In this chapter, we cover implementation of IBM Tivoli Workload Scheduler in an HACMP and an MCSC cluster. The chapter is divided into the following main sections: “Implementing IBM Tivoli Workload Scheduler in an HACMP cluster” on page 184 “Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster” on page 347© Copyright IBM Corp. 2004. All rights reserved. 183
    • 4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster In this section, we describe the steps to implement IBM Tivoli Workload Scheduler in an HACMP cluster. We use the mutual takeover scenario described in 3.1.1, “Mutual takeover for IBM Tivoli Workload Scheduler” on page 64. Note: In this section we assume that you have finished planning your cluster and have also finished the preparation tasks to install HACMP. If you have not finished these tasks, perform the steps described in Chapter 3, “Planning and Designing an HACMP Cluster”, and the preparation tasks described in Chapter 3 “Installing HACMP”. We strongly recommend that you install IBM Tivoli Workload Scheduler before HACMP, and confirm that IBM Tivoli Workload Scheduler runs without any problem. It is important that you also confirm that IBM Tivoli Workload Scheduler is able to fallover and fallback between nodes, by manually moving the volume group between nodes. This verification procedure is described in “Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster” on page 202.4.1.1 IBM Tivoli Workload Scheduler implementation overview Figure 4-1 on page 185 shows a diagram of a IBM Tivoli Workload Scheduler implementation in a mutual takeover HACMP cluster. Using this diagram, we will describe how IBM Tivoli Workload Scheduler could be implemented, and what you should be aware of. Though we do not describe a hot standby scenario of IBM Tivoli Workload Scheduler, the steps used to configure IBM Tivoli Workload Scheduler for a mutual takeover scenario also cover what should be done for a hot standby scenario.184 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Cluster: cltivoli Mount Point 1: Mount Point 1: /usr/maestro /usr/maestro User: User: maestro maestro TWS Engine1 nmport=31111 Mount Point 2: IP=tivaix1_svc Mount Point 2: /usr/ /usr/maestro2 maestro2 User: TWS Engine2 User: maestro2 nmport=31112 maestro2 IP=tivaix2_svc tivaix2 tivaix1Figure 4-1 IBM Tivoli Workload Scheduler implementation overview To make IBM Tivoli Workload Scheduler highly available in an HACMP cluster, the IBM Tivoli Workload Scheduler instance should be installed on the external shared disk. This means that the /TWShome directory should reside on the shared disk and not the locally attached disk. This is the bottom line to enable HACMP to relocate the IBM Tivoli Workload Scheduler engine from one node to another, along with other system components such as external disks and service IP labels. When implementing IBM Tivoli Workload Scheduler in a cluster, there are certain items you should be aware of, such as the location of the IBM Tivoli Workload Scheduler engine and the IP address used for IBM Tivoli Workload Scheduler workstation definition. Specifically for a mutual takeover scenario, you have more to consider, as there will be multiple instances of IBM Tivoli Workload Scheduler running on one node. Following are the considerations you need to keep in mind when implementing IBM Tivoli Workload Scheduler in an HACMP cluster. The following considerations apply for Master Domain Manager, Domain Manager, Backup Domain Manager and FTA. Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 185
    • Location of IBM Tivoli Workload Scheduler engine executables As mentioned earlier, IBM Tivoli Workload Scheduler engine should be installed in the external disk to be serviced by HACMP. In order to have the same instance of IBM Tivoli Workload Scheduler process its job on another node after a fallover, executables must be installed on the external disk. For Version 8.2, all files essential to IBM Tivoli Workload Scheduler processing are installed in the /TWShome directory. The /TWShome directory should reside on file systems on the shared disk. For versions prior to 8.2, IBM Tivoli Workload Scheduler executables should be installed in a file system with the mount point above the /TWShome directory. For example, if /TWShome is /usr/maestro/maestro, the mount point should be /usr/maestro. In a mutual takeover scenario, you may have a case where multiple instances of IBM Tivoli Workload Scheduler are installed on the shared disk. In such a case, make sure these instances are installed on separate file systems residing on separate volume groups. Creating mount points on standby nodes Create a mount point for the IBM Tivoli Workload Scheduler file system on all nodes that may run that instance of IBM Tivoli Workload Scheduler. When configuring for a mutual takeover, make sure that you create mount points for every IBM Tivoli Workload Scheduler instance that may run a node. In Figure 4-1 on page 185, nodes tivaix1 and tivaix2 may both have two instances of IBM Tivoli Workload Scheduler engine running in case of a node failure. Note that in the diagram, both nodes have mount points for TWS Engine1 and TWS Engine2. IBM Tivoli Workload Scheduler user account and group account On each node, create a IBM Tivoli Workload Scheduler user and group for all IBM Tivoli Workload Scheduler instances that may run on the node. The user’s home directory must be set to /TWShome. If a IBM Tivoli Workload Scheduler instance will fallover and fallback among several nodes in a cluster, make sure all those nodes have the IBM Tivoli Workload Scheduler user and group defined to control that instance. In the mutual takeover scenario, you may have multiple instances running at the same time on one node. Make sure you create separate users for each IBM Tivoli Workload Scheduler instance in your cluster so that you are able to control them separately. In our scenario, we add user maestro and user maestro2 on both nodes because TWS Engine1 and TWS Engine2 should be able to run on both nodes. The same group accounts should be created on both nodes to host these users.186 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Netman portWhen there will be only one instance of IBM Tivoli Workload Schedulerrunning on a node, using the default port (31111) is sufficient.For a mutual takeover scenario, you need to consider setting different portnumbers for each IBM Tivoli Workload Scheduler instance in the cluster. Thisis because several instances of IBM Tivoli Workload Scheduler may run onthe same node, and no IBM Tivoli Workload Scheduler instance on the samenode should have same netman port. In our scenario, we set the netman portof TWS Engine1 to 31111, and the netman port of TWS Engine2 to 31112.IP addressThe IP address or IP label specified in the workstation definition should be theservice IP address or the service IP label for HACMP. If you plan a fallover ora fallback for an IBM Tivoli Workload Scheduler instance, it should not use anIP address or IP label that is bound to a particular node. (Boot address andpersistent address used in an HACMP cluster are normally bound to onenode, so these should not be used.) This is to ensure that IBM TivoliWorkload Scheduler instance does not lose connection with other IBM TivoliWorkload Scheduler instances in case of a fallover or a fallback.In our diagram, note that TWS_Engine1 uses a service IP address calledtivaix1_service, and TWS_Engine2 uses a service IP address calledtivaix2_service. These service IP address will move along with the IBM TivoliWorkload Scheduler instance from one node to another.Starting and stopping IBM Tivoli Workload Scheduler instancesIBM Tivoli Workload Scheduler instances should be started and stopped fromHACMP application start and stop scripts. Generate a custom script to startand stop each IBM Tivoli Workload Scheduler instance in your cluster, thenwhen configuring HACMP, associate your custom scripts to resource groupsthat your IBM Tivoli Workload Scheduler instances reside in.If you put IBM Tivoli Workload Scheduler under the control of HACMP, itshould not be started from /etc/inittab or from any other way except forapplication start and stop scripts.Files installed on the local diskThough most IBM Tivoli Workload Scheduler executables are installed in theIBM Tivoli Workload Scheduler file system, some files are installed on localdisks. You may have to copy these local files to other nodes.For IBM Tivoli Workload Scheduler 8.2, copy the/usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a file. Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 187
    • For IBM Tivoli Workload Scheduler8.1, you may need to copy the following files to any node in the cluster that will host the IBM Tivoli Workload Scheduler instance: – /usr/unison/components – /usr/lib/libatrc.a – /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a Monitoring the IBM Tivoli Workload Scheduler process HACMP is able to monitor application processes. It can be configured to initiate a cluster event based on application process failures. When considering to monitor TWS using HACMP’s application monitoring, keep in mind that IBM Tivoli Workload Scheduler stops and restarts its all its processes (excluding the netman process) every 24 hours. The recycling of the processes is initiated by the FINAL jobstream, which is set to run at a certain time everyday. Be aware that if you configure HACMP to initiate an action in the event of a TWS process failure, this expected behavior of IBM Tivoli Workload Scheduler could be interpreted as a failure of IBM Tivoli Workload Scheduler processes, and could trigger unwanted action. If you simply want to monitor process failures, we recommend that you use monitoring software (for example, IBM Tivoli Monitoring.)4.1.2 Preparing to install Before installing IBM Tivoli Workload Scheduler in an HACMP cluster, define the IBM Tivoli Workload Scheduler group and user account on each node that will host IBM Tivoli Workload Scheduler. The following procedure presents an example of how to prepare for an installation of IBM Tivoli Workload Scheduler 8.2 on AIX 5.2. We assume that IBM Tivoli Workload Scheduler file system is already created as described in 3.2.3, “Planning and designing an HACMP cluster” on page 67. In our scenario, we added a group named tivoli, users maestro and maestro2 on each node. 1. Creating group accounts Execute the following on all the nodes that IBM Tivoli Workload Scheduler instance will run. a. Enter the following command; this will take you to the SMIT Groups menu: # smitty groups b. From the Groups menu, select Add a Group. c. Enter a value for each of the following items:188 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Group NAME Assign a name for the group. ADMINISTRATIVE Group true Group ID Assign a group ID. Assign the same ID for all nodes in the cluster. Figure 4-2 shows an example of adding a group. We added group tivoli with an ID 2000. Add a Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Group NAME [tivoli] ADMINISTRATIVE group? true + Group ID [2000] # USER list [] + ADMINISTRATOR list [] + F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 4-2 Adding a group 2. Adding IBM Tivoli Workload Scheduler users Perform the following procedures for all nodes in the cluster: a. Enter the following command; this will take you to the SMIT Users menu: # smitty user b. From the Users menu, select Add a User. c. Enter the values for the following item, then press Enter. The other items should be left as it is. User NAME Assign a name for the user. Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 189
    • User ID Assign an ID for the user. This ID for the user should be the same on all nodes. ADMINISTRATIVE USER? false Primary GROUP Set the group that you have defined in the previous step. Group SET Set the primary group and the staff group. HOME directory Set /TWShome. Figure 4-3 shows an example of a IBM Tivoli Workload Scheduler user definition. In the example, we defined maestro user. Add a User Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] [Entry Fields] * User NAME [maestro] User ID [1001] # ADMINISTRATIVE USER? false + Primary GROUP [tivoli] + Group SET [tivoli,staff] + ADMINISTRATIVE GROUPS [] + ROLES [] + Another user can SU TO USER? true + SU GROUPS [ALL] + HOME directory [/usr/maestro] Initial PROGRAM [] User INFORMATION [] EXPIRATION date (MMDDhhmmyy) [0] [MORE...37] F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image Esc+9=Shell Esc+0=Exit Enter=DoFigure 4-3 Defining a user d. After you have added the user, modify the $HOME/.profile of the user. Modify the PATH variable to include the /TWShome and /TWShome/bin directory. This enables you to run IBM Tivoli Workload Scheduler commands in any directory as long as you are logged in as the IBM Tivoli Workload Scheduler user. Also add the TWS_TISDIR variable. The value for the TWS_TISDIR should be the /TWShome directory. The TWS_TISDIR enables IBM Tivoli Workload Scheduler to display190 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • messages in the correct language codeset. Example 4-1 shows an example of how the variable should be defined. In the example, /usr/maestro is the /TWShome directory. Example 4-1 An example .profile for TWSusr PATH=/TWShome:/TWShome/bin:$PATH export PATH TWS_TISDIR=/usr/maestro export TWS_TISDIR4.1.3 Installing the IBM Tivoli Workload Scheduler engine In this section, we show you the steps to install IBM Tivoli Workload Scheduler 8.2 Engine (Master Domain Manager) from the command line. For procedures to install IBM Tivoli Workload Scheduler using the graphical user interface, refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273. In our scenario, we installed two TWS instances called TIVAIX1 and TIVAIX2 on a shared external disk. TIVAIX1 was installed from node tivaix1, and TIVAIX2 was installed from tivaix2. We used the following steps to do this. 1. Before installing, identify the following items. These items are required when running the installation script. – workstation type - master – workstation name - The name of the workstation. This is the value for the host field that you specify in the workstation definition. It will also be recorded in the globalopts file. – netman port - Specify the listening port for netman. We remind you again that if you plan to have several instances of IBM Tivoli Workload Scheduler running on machine, make sure you specify different port numbers for each IBM Tivoli Workload Scheduler instance. – company name - Specify this if you would like your company name in reports produced by IBM Tivoli Workload Scheduler report commands. 2. Log in to the node where you want to install the IBM Tivoli Workload Scheduler engine, as a root user. 3. Confirm that the IBM Tivoli Workload Scheduler file system is mounted. If it is not mounted, use the mount command to mount the IBM Tivoli Workload Scheduler file system. 4. Insert IBM Tivoli Workload Scheduler Installation Disk 1. Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 191
    • 5. Locate the twsinst script in the directory of the platform on which you want to run the script. The following is an example of installing a Master Domain Manager named TIVAIX1. # ./twsinst -new -uname twsusr -cputype master -thiscpu cpuname -master cpuname -port port_no -company company_name Where: – twsusr - The name of the IBM Tivoli Workload Scheduler user. – master - The workstation type. Refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273, for other options – cpuname - The name of the workstation. For -thiscpu, specify the name of the workstation that you are installing. For -master, specify the name of the Master Domain Manager. When installing the Master Domain Manager, specify the same value for -thiscpu and -master. – port_no - Specify the port number that netman uses to receive incoming messages other workstations. – company_name - The name of your company (optional) Example 4-2 shows sample command syntax for installing Master Domain Manager TIVAIX1. Example 4-2 twsinst script example for TIVAIX1 # ./twsinst -new -uname maestro -cputype master -thiscpu tivaix1 -master tivaix1 -port 31111 -company IBM Example 4-3 shows sample command syntax for installing Master Domain Manager TIVAIX2. Example 4-3 twsinst script example for TIVAIX2 # ./twsinst -new -uname maestro2 -cputype master -thiscpu tivaix2 -master tivaix2 -port 31112 -company IBM4.1.4 Configuring the IBM Tivoli Workload Scheduler engine After you have installed the IBM Tivoli Workload Scheduler engine as a Master Domain Manager, perform the following configuration tasks. These are the minimum tasks that you should perform to get IBM Tivoli Workload Scheduler Master Domain Manager running. For instructions on configuring other types of workstation, such as Fault Tolerant Agents and Domain Managers, refer to Tivoli Workload Scheduler Job Scheduling Console User’s Guide, SH19-4552, or Tivoli Workload Scheduler Version 8.2, Reference Guide, SC32-1274.192 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Checking the workstation definitionIn order to have IBM Tivoli Workload Scheduler serviced correctly by HACMP inthe event of a fallover, it must have the service IP label or the service IP addressdefined in its workstation definition. When installing a Master Domain Manager(master), the workstation definition is added automatically. After you haveinstalled IBM Tivoli Workload Scheduler, check the workstation definition of themaster and verify that the service IP label or the address is associated with themaster.1. Log into the master workstation as: TWSuser.2. Execute the following command; this opens a text editor with the master’s CPU definition: $ composer “modify cpu=master_name” Where: – master - the workstation name of the master. Example 4-4 and Example 4-5 give the workstation definition for workstations TIVAIX1 and TIVAIX2 that we installed. Notice that the value for NODE is set to the service IP label in each workstation definition.Example 4-4 Workstation definition for TIVIAIX1CPUNAME TIVAIX1 DESCRIPTION "MASTER CPU" OS UNIX NODE tivaix1_svc DOMAIN MASTERDM TCPADDR 31111 FOR MAESTRO AUTOLINK ON RESOLVEDEP ON FULLSTATUS ONENDExample 4-5 Workstation definition for TIVIAIX1CPUNAME TIVAIX2 DESCRIPTION "MASTER CPU" OS UNIX NODE tivaix2_svc DOMAIN MASTERDM TCPADDR 31112 FOR MAESTRO AUTOLINK ON RESOLVEDEP ON FULLSTATUS ONEND Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 193
    • 3. If the value for NODE is set to the service IP label correctly, then close the workstation definition. If is not set correctly, then modify the file and save. Adding the FINAL jobstream The FINAL jobstream is responsible for generating daily production files. Without this jobstream, IBM Tivoli Workload Scheduler is unable to perform daily job processing. IBM Tivoli Workload Scheduler provides a definition file that you can use to add this FINAL jobstream. The following steps describe how to add the FINAL jobstream using this file. 1. Log in as the IBM Tivoli Workload Scheduler user. 2. Add the FINAL schedule by running the following command. $ composer "add Sfinal" 3. Run Jnextday to create the production file. $ Jnextday 4. Check the status of IBM Tivoli Workload Scheduler by issuing the following command. $ conman status If IBM Tivoli Workload Scheduler started correctly, the status should be Batchman=LIVES. 5. Check that all IBM Tivoli Workload Scheduler processes (netman, mailman, batchman, jobman) are running. Example 4-6 illustrates checking for the IBM Tivoli Workload Scheduler process. Example 4-6 Checking for IBM Tivoli Workload Scheduler process $ ps -ef | grep -v grep | grep maestro maestro2 14484 31270 0 16:59:41 - 0:00 /usr/maestro2/bin/batchman -parm 32000 maestro2 16310 13940 1 16:00:29 pts/0 0:00 -ksh maestro2 26950 1 0 22:38:59 - 0:00 /usr/maestro2/bin/netman maestro2 28658 16310 2 17:00:07 pts/0 0:00 ps -ef root 29968 14484 0 16:59:41 - 0:00 /usr/maestro2/bin/jobman maestro2 31270 26950 0 16:59:41 - 0:00 /usr/maestro2/bin/mailman -parm 3 2000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE $4.1.5 Installing IBM Tivoli Workload Scheduler Connector If you plan to use JSC to perform administration tasks for IBM Tivoli Workload Scheduler, install the IBM Tivoli Workload Scheduler connector. IBM Tivoli Workload Scheduler connector must be installed on any TMR server or Managed194 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Node that is running IBM Tivoli Workload Scheduler Master Domain Manager.Optionally, JSC could be installed on any Domain Manager or FTA, providingthat Managed Node is also installed. Note: Tivoli Management Framework should be installed prior to IBM Tivoli Workload Scheduler Connector installation. For instructions on installing a TMR server, refer to Chapter 5 or Tivoli Enterprise Installation Guide Version 4.1, GC32-0804. In this section, we assume that you have already installed Tivoli Management Framework, and have applied the latest set of fix packs.Here we describe the steps to install Job Scheduling Services (a prerequisite toinstall IBM Tivoli Workload Scheduler Connector) and IBM Tivoli WorkloadScheduler Connector by using the command line. For instructions on installingIBM Tivoli Workload Scheduler Connector from the Tivoli Desktop, refer to TivoliWorkload Scheduler Job Scheduling Console User’s Guide, SH19-4552.For our mutual takeover scenario, each node in our two-node HACMP cluster(tivaix1, tivaix2) hosts a TMR server. We installed IBM Tivoli Workload SchedulerConnector on each of the two cluster nodes.1. Before installing, identify the following items. These items are required when running the IBM Tivoli Workload Scheduler Connector installation script. – Node name to install IBM Tivoli Workload Scheduler Connector - This must be the name defined in the Tivoli Management Framework. – The full path to the installation image - For Job Scheduling Services, it is the directory with the TMF_JSS.IND file. For IBM Tivoli Workload Scheduler Connector, it is the directory with the TWS_CONN.IND file. – IBM Tivoli Workload Scheduler installation directory - The /TWShome directory. – Connector Instance Name - A name for a connector instance name. – Instance Owner - The name of the IBM Tivoli Workload Scheduler user.2. Insert the IBM Tivoli Workload Scheduler Installation Disk 1.3. Log in on the TMR server as root user.4. Run the following command to source the Tivoli environment variables: # . /etc/Tivoli/setup_env.sh5. Run the following command to install Job Scheduling Services: # winstall -c install_dir -i TMF_JSS nodename Where: – install_dir - the path to the installation image Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 195
    • – nodename - the name of the TMR server or the Managed Node that you are installing JSS on. The command will perform a prerequisite verification, and you will be prompted to proceed with the installation or not. Example 4-7 illustrates the execution of the command. Example 4-7 Installing JSS from the command line # winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_1/TWS_CONN -i TMF_JSS tivaix1 Checking product dependencies... Product TMF_3.7.1 is already installed as needed. Dependency check completed. Inspecting node tivaix2... Installing Product: Tivoli Job Scheduling Services v1.2 Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts: tivaix2 need to copy the CAT (generic) to: tivaix2:/usr/local/Tivoli/msg_cat For the machines in the aix4-r1 class: hosts: tivaix2 need to copy the BIN (aix4-r1) to: tivaix2:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix2:/usr/local/Tivoli/spool/tivaix2.db Continue([y]/n)? Creating product installation description object...Created. Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix2 Completed. Distributing architecture specific Binaries --> tivaix2 Completed. Distributing architecture specific Server Database --> tivaix2 ....Product install completed successfully. Completed. Registering product installation attributes...Registered.196 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 6. Verify that Job Scheduling Services was installed by running the following command: # wlsinst -p This command shows a list of all the Tivoli products installed in your environment. You should see in the list “Tivoli Job Scheduling Services v1.2”. Example 4-8 shows an example of the command output. The 10th line shows that JSS was installed successfullyExample 4-8 wlsinst -p command output# wlsinst -pTivoli Management Framework 4.1Tivoli ADE, Version 4.1 (build 09/19)Tivoli AEF, Version 4.1 (build 09/19)Tivoli Java Client Framework 4.1Java 1.3 for TivoliTivoli Java RDBMS Interface Module (JRIM) 4.1JavaHelp 1.0 for Tivoli 4.1Tivoli Software Installation Service Client, Version 4.1Tivoli Software Installation Service Depot, Version 4.1Tivoli Job Scheduling Services v1.2Distribution Status Console, Version 4.1#7. To install IBM Tivoli Workload Scheduler Connector, run the following command: # winstall -c install_dir -i TWS_CONN twsdir=/TWShome iname=instance owner=twsuser createinst=1 nodename Where: – Install_dir - the path of the installation image. – twsdir - set this to /TWSHome. – iname - the name of the IBM Tivoli Workload Scheduler Connector instance. – owner - the name of the IBM Tivoli Workload Scheduler user.8. Verify that IBM Tivoli Workload Scheduler Connector was installed by running the following command. # wlsinst -p This command shows a list of all the Tivoli products installed in your environment. You should see in the list “TWS Connector 8.2”. The following is an example of a command output. The 11th line shows that IBM Tivoli Workload Scheduler Connector was installed successfully. Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 197
    • Example 4-9 wlsinst -p command output # wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1 (build 09/19) Tivoli AEF, Version 4.1 (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Tivoli TWS Connector 8.2 Distribution Status Console, Version 4.1 #4.1.6 Setting the security After you have installed IBM Tivoli Workload Scheduler Connectors, apply changes to the IBM Tivoli Workload Scheduler Security file so that users can access IBM Tivoli Workload Scheduler through JSC. If you grant access to a Tivoli Administrator, then any operating system user associated to that Tivoli Administrator is granted access through JSC. For more information on IBM Tivoli Workload Scheduler Security file, refer to Tivoli Workload Scheduler Version 8.2 Installation Guide, SC32-1273. To modify the security file, follow the procedures described in this section. For our scenario, we added the name of two Tivoli Administrators, Root_tivaix1-region and Root_tivaix2-region, to the Security file of each Master Domain Manager. Root_tivaix1-region is a Tivoli Administrator on tivaix1, and Root_tivaix2-region is a Tivoli Administrator on tivaix2. This will make each iIBM Tivoli Workload Scheduler Master Domain Manager accessible from either of the two TMR servers. In the event of a fallover, IBM Tivoli Workload Scheduler Master Domain Manager remains accessible from JSC through the Tivoli Administrator on the surviving node. 1. Log into IBM Tivoli Workload Scheduler master as the TWSuser. TWSuser is the user you have used to install IBM Tivoli Workload Scheduler. 2. Run the following command to dump the Security file to a text file. # dumpsec > /tmp/sec.txt 3.Modify the security file and save your changes. Add the name of Tivoli Administrators to the LOGON clause.198 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • Example 4-8 on page 197 illustrates a security file. This security file grants full privileged access to Tivoli Administrators called Root_tivaix1-region and Root_tivaix2-region.Example 4-10 Example of a security fileUSER MAESTRO CPU=@+LOGON=maestro,root,Root_tivaix2-region,Root_tivaix1-regionBEGIN USEROBJ CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY,ALTPASS JOB CPU=@ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,CONFIRM,DELDEP,DELETE,DISPLAY,KILL,MODIFY,RELEASE,REPLY,RERUN,SUBMIT,USE,LIST SCHEDULE CPU=@ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,DELDEP,DELETE,DISPLAY,LIMIT,MODIFY,RELEASE,REPLY,SUBMIT,LIST RESOURCE CPU=@ACCESS=ADD,DELETE,DISPLAY,MODIFY,RESOURCE,USE,LIST PROMPT ACCESS=ADD,DELETE,DISPLAY,MODIFY,REPLY,USE,LIST FILE NAME=@ ACCESS=CLEAN,DELETE,DISPLAY,MODIFY CPU CPU=@ACCESS=ADD,CONSOLE,DELETE,DISPLAY,FENCE,LIMIT,LINK,MODIFY,SHUTDOWN,START,STOP,UNLINK,LIST PARAMETER CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY CALENDAR ACCESS=ADD,DELETE,DISPLAY,MODIFY,USEEND4. Verify your security file by running the following command. Make sure that no errors or warnings are displayed. $ makesec -v /tmp/sec.txt Note: Running makesec command with the -v option only verifies your security file to see there are no syntax errors. It does not update the security database. Example 4-11 shows the sample output of the makesec -v command:Example 4-11 Output of makesec -v command$ makesec -v /tmp/sec.txtTWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2003US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 199
    • Corp. MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)] MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total) $ 5. If there are no errors, compile the security file with the following command: $ makesec /tmp/sec.txt Example 4-12 illustrates output of the makesec command: Example 4-12 Output of makesec command $ makesec /tmp/sec.txt TWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)] MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total) MAKESEC:Security file installed as /usr/maestro/Security $ 6. When applying changes to the security file, the connector instance should be stopped to allow the change to take effect. Run the following commands to source the Tivoli environment variables and stop the connector instance: $ . /etc/Tivoli/setup_env.sh $ wmaeutil inst_name -stop "*" where inst_name is the name of the instance you would like to stop. Example 4-13 shows an example of wmaeutil command to stop a connector instance called TIVAIX1. Example 4-13 Output of wmaeutil command $ . /etc/Tivoli/setup_env.sh $ wmaeutil TIVAIX1 -stop "*" AWSBCT758I Done stopping the ENGINE server AWSBCT758I Done stopping the DATABASE server AWSBCT758I Done stopping the PLAN server $ Note: You do not need to manually restart the connector instance, as it is automatically started when a user logs in to JSC.200 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • 7. Verify that the changes in the security file are effective. by running the dumpsec command. This will dump the current content of the security file into a text file. Open the text file and confirm that the previous change you have made is reflected: $ dumpsec > filename where filename is the name of the text file. 8. Verify that the changes are effective by logging into JSC as a user you have added in the security file.4.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance One IBM Tivoli Workload Scheduler Connector instance can only be mapped to one IBM Tivoli Workload Scheduler instance. In our mutual takeover scenario, one TMR server would be hosting two instances of IBM Tivoli Workload Scheduler in case a fallover occurs. An additional IBM Tivoli Workload Scheduler Connector instance is required on each node so that a user can access both instances of IBM Tivoli Workload Scheduler on the surviving node. We added a connector instance to each node to control both IBM Tivoli Workload Scheduler Master Domain Manager TIVAIX1 and TIVAIX2. To add an additional IBM Tivoli Workload Scheduler Connector Instance, perform the following tasks. Note: You must install the Job Scheduling Services and IBM Tivoli Workload Scheduler Connector Framework products before performing these tasks. 1. Log into a cluster node as root. 2. Source the Tivoli environment variables by running the following command: #. /etc/Tivoli/setup_env.sh 3. List the existing connector instance: # wlookup -ar MaestroEngine Example 4-14 on page 201 shows one IBM Tivoli Workload Scheduler Connector instance called TIVAIX1. Example 4-14 Output of wlookup command before adding additional instance # wlookup -ar MaestroEngine TIVAIX1 1394109314.1.661#Maestro::Engine# Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 201
    • 4. Add an additional connector instance: # wtwsconn.sh -create -n instance_name -t TWS_directory where: instance_name - the name of the instance you would like to add. TWS_directory - the path where the IBM Tivoli Workload Scheduler engine associated with the instance resides. Example 4-15 shows output for the wtwsconn.sh command. We added a TWS Connector instance called TIVAIX2. This instance is for accessing IBM Tivoli Workload Scheduler engine installed on /usr/maestro directory. Example 4-15 Sample wtwsconn.sh command # wtwsconn.sh -create -n TIVAIX2 -t /usr/maestro Scheduler engine created Created instance: TIVAIX2, on node: tivaix1 MaestroEngine maestroHomeDir attribute set to: /usr/maestro2 MaestroPlan maestroHomeDir attribute set to: /usr/maestro2 MaestroDatabase maestroHomeDir attribute set to: /usr/maestro2 5. Run the wlookup -ar command again to verify that the instance was successfully added. The IBM Tivoli Workload Scheduler Connector that you have just added should show up in the list. # wlookup -ar MaestroEngine Example 4-16 shows that IBM Tivoli Workload Scheduler Connector instance TIVAIX2 is added to the list. Example 4-16 Output of wlookup command after adding additional instance # wlookup -ar MaestroEngine TIVAIX1 1394109314.1.661#Maestro::Engine# TIVAIX2 1394109314.1.667#Maestro::Engine##4.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster When you have finished installing IBM Tivoli Workload Scheduler, verify that the BM Tivoli Workload Scheduler is able to move from one node to another, and that it is able to run on the standby node(s) in the cluster. It is important that you perform this task manually before applying fix packs, and also before you install HACMP. Making sure that IBM Tivoli Workload Scheduler202 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
    • behaves as expected before each major change simplifies troubleshooting incase you have issues with IBM Tivoli Workload Scheduler. If you apply IBM TivoliWorkload Scheduler fix packs and install HACMP, and then find out that IBMTivoli Workload Scheduler behaves unexpectedly, it would be difficult todetermine the cause of the problem. Though it may seem cumbersome, westrongly recommend that you verify IBM Tivoli Workload Scheduler behaviorbefore you make a change to a system. The sequence of the verification is asfollows.1. Stop IBM Tivoli Workload Scheduler on a cluster node. Log in as TWSuser and run the following command: $ conman "shut ;wait"2. Migrate the volume group to another node. Refer to the volume group migration procedure described in “Define the shared LVM components” on page 94.3. Start IBM Tivoli Workload Scheduler on the node by running the conman start command: $ conman start4. Verify the batchman status. Make sure the Batchman status is LIVES. $ conman status5. Verify that all IBM Tivoli Workload Scheduler processes are running by issuing the ps command: $ ps -ef | grep -v grep | grep maestro Example 4-17 shows an example of ps command output. Check that netman, mailman, batchman and jobman processes are running for each IBM Tivoli Workload Scheduler instance installed.Example 4-17 Output of ps command$ ps -ef | grep -v grep | grep maestro maestro 26378 43010 1 18:46:58 pts/1 0:00 -ksh root 30102 34192 0 18:49:59 - 0:00 /usr/maestro/bin/jobman maestro 33836 38244 0 18:49:59 - 0:00 /usr/maestro/bin/mailman -parm32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE maestro 34192 33836 0 18:49:59 - 0:00 /usr/maestro/bin/batchman -parm32000 maestro 38244 1 0 18:49:48 - 0:00 /usr/maestro/bin/netman maestro 41214 26378 4 18:54:52 pts/1 0:00 ps -ef$ Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 203
    • 6. If using JSC, log into the IBM Tivoli Workload Scheduler Master Domain Manager. Verify that you are able to see the scheduling objects and the production plan.4.1.9 Applying IBM Tivoli Workload Scheduler fix pack When you have completed installing IBM Tivoli Workload Scheduler and IBM Tivoli Workload Scheduler Connector, apply the latest fix pack available. For instructions on installing the fix pack for IBM Tivoli Workload Scheduler engine, refer to the README file included in each fix pack. The IBM Tivoli Workload Scheduler engine fix pack can be applied either from the command line by using the twspatch script, or from the Java-based graphical user interface. The IBM Tivoli Workload Scheduler Connector fix pack is applied from the Tivoli Desktop. Because instructions on applying IBM Tivoli Workload Scheduler Connectors are not documented in the fix pack README, we describe the procedures to install IBM Tivoli Workload Scheduler Connector fix packs here. Before applying any of the fix packs, make sure you have a viable backup. Note: The same level of fix pack should be applied to the IBM Tivoli Workload Scheduler engine and the IBM Tivoli Workload Scheduler Connector. If you apply a fix pack to the IBM Tivoli Workload Scheduler engine, make sure you apply the same level of fix pack for IBM Tivol