Service level management using ibm tivoli service level advisor and tivoli business systems manager sg246464
Upcoming SlideShare
Loading in...5
×
 

Service level management using ibm tivoli service level advisor and tivoli business systems manager sg246464

on

  • 1,323 views

 

Statistics

Views

Total Views
1,323
Views on SlideShare
1,323
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Service level management using ibm tivoli service level advisor and tivoli business systems manager sg246464 Service level management using ibm tivoli service level advisor and tivoli business systems manager sg246464 Document Transcript

  • Front coverService Level Management UsingIBM Tivoli Service Level Advisor andTivoli Business Systems ManagerIntegrate Tivoli Business SystemsManager and Tivoli Service Level AdvisorMap business service managementto service level managementAchieve proactive service levelmanagement Edson Manoel Kimberly Cox Eswara Kosaraju Matt Roseblade Alex Shafir Venkat Surath Eduardo Tanaka Brian Watsonibm.com/redbooks
  • International Technical Support OrganizationService Level Management UsingIBM Tivoli Service Level Advisor andTivoli Business Systems ManagerDecember 2004 SG24-6464-00
  • Note: Before using this information and the product it supports, read the information in “Notices” on page ix.First Edition (December 2004)This edition applies to IBM Tivoli Business Systems Manager V3.1, IBM Tivoli Service LevelAdvisor V2.1, IBM Tivoli Enterprise Console V3.9, and IBM Tivoli Monitoring for TransactionPerformance V5.3 products. Note: This book is based on a pre-GA version of a product and may not apply when the product becomes generally available. We recommend that you consult the product documentation or follow-on versions of this redbook for more current information.© Copyright International Business Machines Corporation 2004. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.
  • Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviPart 1. Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction to service level management . . . . . . . . . . . . . . . . . 3 1.1 Service level management overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Service level management benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Service level management components . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.3 People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Business service management approach to service level management. . 17 1.4.1 Convergence of business service management and service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5 Improving service level management through integration . . . . . . . . . . . . . 20 1.6 Scope of this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 2. General approach for implementing service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 A look at the ITIL process improvement model . . . . . . . . . . . . . . . . . . . . . 25 2.2 Planning for service level management implementation . . . . . . . . . . . . . . 26 2.2.1 Identifying roles and responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.2 Understanding the services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.3 Assessing the ability to deliver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 Implementing service level management . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.1 Developing service level objectives . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.2 Negotiating on service level agreements . . . . . . . . . . . . . . . . . . . . . 37 2.3.3 Implementing service level management tools . . . . . . . . . . . . . . . . . 38 2.3.4 Establishing a reporting function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3.5 Adjusting IT processes to include service level management. . . . . . 41 2.4 Ongoing service level management program . . . . . . . . . . . . . . . . . . . . . . 44 2.4.1 Maintenance of service definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 45© Copyright IBM Corp. 2004. All rights reserved. iii
  • 2.4.2 Service level agreement management via historical reporting . . . . . 46 2.4.3 Priority management of real-time faults . . . . . . . . . . . . . . . . . . . . . . 47 2.5 Continuous improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.1 Improving quality of service levels . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.2 Improving efficiency of service level management . . . . . . . . . . . . . . 49 2.5.3 Improving effectiveness of service level management . . . . . . . . . . . 50 Chapter 3. IBM Tivoli products that assist in service level management 53 3.1 IBM Tivoli product mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.1 The monitoring and measurement layer . . . . . . . . . . . . . . . . . . . . . . 54 3.1.2 The service level management layer . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 IBM Tivoli Business Systems Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 56 3.2.3 Benefits of using IBM Tivoli Business Systems Manager . . . . . . . . . 58 3.2.4 Key concepts in IBM Tivoli Business Systems Manager . . . . . . . . . 59 3.2.5 IBM Tivoli Business Systems Manager architecture . . . . . . . . . . . . . 62 3.3 IBM Tivoli Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 65 3.3.3 Benefits of using Tivoli Data Warehouse . . . . . . . . . . . . . . . . . . . . . 66 3.3.4 Key concepts in Tivoli Data Warehouse . . . . . . . . . . . . . . . . . . . . . . 67 3.3.5 Tivoli Data Warehouse architecture . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 IBM Tivoli Service Level Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 72 3.4.3 Benefits of using IBM Tivoli Service Level Advisor . . . . . . . . . . . . . . 74 3.4.4 Key concepts in IBM Tivoli Service Level Advisor . . . . . . . . . . . . . . 75 3.4.5 IBM Tivoli Service Level Advisor architecture . . . . . . . . . . . . . . . . . . 76 3.5 IBM Tivoli Monitoring for Transaction Performance . . . . . . . . . . . . . . . . . 78 3.5.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.5.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 79 3.5.3 Benefits of using IBM Tivoli Monitoring for Transaction Performance80 3.5.4 Key concepts in IBM Tivoli Monitoring for Transaction Performance 80 3.5.5 IBM Tivoli Monitoring for Transaction Performance architecture . . . 83 3.6 IBM Tivoli Enterprise Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.6.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.6.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 87 3.6.3 Benefits of using IBM Tivoli Enterprise Console . . . . . . . . . . . . . . . . 88 3.6.4 Key concepts of event groups in IBM Tivoli Enterprise Console. . . . 89 3.6.5 IBM Tivoli Enterprise Console architecture . . . . . . . . . . . . . . . . . . . . 90 3.7 IBM Tivoli Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.7.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94iv Service Level Management
  • 3.7.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 94 3.7.3 Benefits of using IBM Tivoli Monitoring . . . . . . . . . . . . . . . . . . . . . . . 95 3.7.4 Key concepts in IBM Tivoli Monitoring . . . . . . . . . . . . . . . . . . . . . . . 96 3.7.5 IBM Tivoli Monitoring architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 983.8 Bringing it all together in support of SLM processes . . . . . . . . . . . . . . . . 100 3.8.1 Service definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.8.2 Real-time monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.8.3 Historical monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.8.4 Fault management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.8.5 SLA reporting and alerting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.8.6 Problem and change management . . . . . . . . . . . . . . . . . . . . . . . . . 107Chapter 4. Planning to implement service level management using Tivoli products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.1 Implementing SLM using Tivoli products. . . . . . . . . . . . . . . . . . . . . . . . . 110 4.1.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.1.3 Ongoing SLM program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.1.4 Improvement process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.2 IBM Tivoli Business Systems Manager V3.1. . . . . . . . . . . . . . . . . . . . . . 117 4.2.1 Propagation, alerts, and events . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.2.2 Basic business system building . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.2.3 Best practices for business system building . . . . . . . . . . . . . . . . . . 120 4.2.4 IBM Tivoli Business Systems Manager business system types . . . 121 4.2.5 IBM Tivoli Business Systems Manager views in an SLM context . . 125 4.2.6 IBM Tivoli Business Systems Manager roles in an SLM context . . 132 4.2.7 Understanding your services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.2.8 Using IBM Tivoli Business Systems Manager 3.1 features for the benefit of SLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.2.9 Using PBT and RLP to manage high availability scenarios . . . . . . 1394.3 Tivoli Data Warehouse V1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1504.4 IBM Tivoli Service Level Advisor V2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.4.1 Building SLAs in IBM Tivoli Service Level Advisor . . . . . . . . . . . . . 156 4.4.2 Supporting SLM with IBM Tivoli Service Level Advisor. . . . . . . . . . 164 4.4.3 Realistic expectations for real-time SLAs . . . . . . . . . . . . . . . . . . . . 186 4.4.4 Integrating IBM Tivoli Service Level Advisor with IBM Tivoli Business Systems Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.5 Additional products supporting SLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 4.5.1 IBM Tivoli Monitoring for Transaction Performance . . . . . . . . . . . . 190 4.5.2 IBM Tivoli Monitoring for Operating Systems . . . . . . . . . . . . . . . . . 192 4.5.3 IBM Tivoli Monitoring for Databases . . . . . . . . . . . . . . . . . . . . . . . . 192 4.5.4 IBM Tivoli Monitoring for Web Infrastructure. . . . . . . . . . . . . . . . . . 193 Contents v
  • Part 2. Case study scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Chapter 5. Case study scenario: IRBTrade Company . . . . . . . . . . . . . . . 197 5.1 Background of the business and its current issues . . . . . . . . . . . . . . . . . 198 5.1.1 The business perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 5.1.2 The Information Technology perspective . . . . . . . . . . . . . . . . . . . . 200 5.2 Existing IT infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.2.1 Systems environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.2.2 Systems management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.2.3 Reporting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.3 A service level management solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 5.3.1 Where we want to be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.3.2 Where we are now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.3.3 How we will get there . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 5.3.4 How we will know we have arrived . . . . . . . . . . . . . . . . . . . . . . . . . 211 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 5.4.1 Additional instrumentation required. . . . . . . . . . . . . . . . . . . . . . . . . 212 5.4.2 Identifying the business service . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 5.4.3 Identifying necessary users roles . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.4.4 Required resource types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 5.4.5 Creating business systems based on business functions. . . . . . . . 231 5.4.6 Defining executive dashboard views. . . . . . . . . . . . . . . . . . . . . . . . 239 5.4.7 Agreeing to and defining service level objectives . . . . . . . . . . . . . . 251 5.4.8 Identifying metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 5.4.9 Enabling data sources in IBM Tivoli Service Level Advisor . . . . . . 260 5.4.10 Setting up schedules, realms, and customers . . . . . . . . . . . . . . . 262 5.4.11 Setting up offerings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 5.4.12 Setting up SLA in IBM Tivoli Service Level Advisor . . . . . . . . . . . 276 5.5 How the new solution works in practice . . . . . . . . . . . . . . . . . . . . . . . . . 292 5.6 Continuous improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Chapter 6. Case study scenario: Greebas Bank. . . . . . . . . . . . . . . . . . . . 315 6.1 Background to the business and its current issues . . . . . . . . . . . . . . . . . 316 6.1.1 The business unit perspective. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 6.1.2 IT management perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 6.2 Existing IT infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 6.2.1 Systems environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 6.2.2 Systems management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 6.2.3 Existing service level management. . . . . . . . . . . . . . . . . . . . . . . . . 322 6.2.4 Business service management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 6.3 A service level management solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 6.3.1 Where we want to be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 6.3.2 Where we are now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326vi Service Level Management
  • 6.3.3 How we will get there . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 6.3.4 How we will know we have arrived . . . . . . . . . . . . . . . . . . . . . . . . . 330 6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 6.4.1 Stage 1: Defining services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 6.4.2 Stage 2: Enhancing instrumentation . . . . . . . . . . . . . . . . . . . . . . . . 333 6.4.3 Stage 3: Determining users and roles . . . . . . . . . . . . . . . . . . . . . . . 337 6.4.4 Stage 4: Determining IBM Tivoli Business Systems Manager resource types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 6.4.5 Stage 5: Creating IBM Tivoli Business Systems Manager business systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 6.4.6 Stage 6: Creating IBM Tivoli Business Systems manager views . . 351 6.4.7 Stage 7: Agreeing to service level agreement objectives . . . . . . . . 363 6.4.8 Stage 8: Defining metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 6.4.9 Stage 9: Preparing for ETLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 6.4.10 Stage 10: Preparing IBM Tivoli Service Level Advisor . . . . . . . . . 371 6.4.11 Stage 11: Creating offerings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 6.4.12 Stage 12: Creating SLAs and OLAs . . . . . . . . . . . . . . . . . . . . . . . 395 6.4.13 Stage 13: SLA reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 6.5 How the SLM solution works in practice . . . . . . . . . . . . . . . . . . . . . . . . . 414 6.5.1 Example 1: Component failure without loss of service . . . . . . . . . . 414 6.5.2 Example 2: Component failure terminates a service. . . . . . . . . . . . 421 6.5.3 Root cause analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 6.5.4 Assessing the SLM solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 6.6 Continuous improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441Part 3. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Appendix A. Service management and the ITIL . . . . . . . . . . . . . . . . . . . . 447 The ITIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Service management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Service delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Service support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Service support disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Configuration management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Service desk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Incident management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Problem management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Change management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Release management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Service delivery disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Capacity management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Availability management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Financial management for IT services . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Contents vii
  • IT service continuity management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Bringing it all together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Constant improvement is a must . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 The power of integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Appendix B. Important concepts and terminology . . . . . . . . . . . . . . . . . 515 IBM Tivoli Service Level Advisor concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . 516 IBM Tivoli Business Systems Manager concepts. . . . . . . . . . . . . . . . . . . . . . 521 Appendix C. Scripts and rules used in this book. . . . . . . . . . . . . . . . . . . 527 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537viii Service Level Management
  • NoticesThis information was developed for products and services offered in the U.S.A.IBM may not offer the products, services, or features discussed in this document in other countries. Consultyour local IBM representative for information on the products and services currently available in your area.Any reference to an IBM product, program, or service is not intended to state or imply that only that IBMproduct, program, or service may be used. Any functionally equivalent product, program, or service thatdoes not infringe any IBM intellectual property right may be used instead. However, it is the usersresponsibility to evaluate and verify the operation of any non-IBM product, program, or service.IBM may have patents or pending patent applications covering subject matter described in this document.The furnishing of this document does not give you any license to these patents. You can send licenseinquiries, in writing, to:IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.The following paragraph does not apply to the United Kingdom or any other country where such provisionsare inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDESTHIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimerof express or implied warranties in certain transactions, therefore, this statement may not apply to you.This information could include technical inaccuracies or typographical errors. Changes are periodically madeto the information herein; these changes will be incorporated in new editions of the publication. IBM maymake improvements and/or changes in the product(s) and/or the program(s) described in this publication atany time without notice.Any references in this information to non-IBM Web sites are provided for convenience only and do not in anymanner serve as an endorsement of those Web sites. The materials at those Web sites are not part of thematerials for this IBM product and use of those Web sites is at your own risk.IBM may use or distribute any of the information you supply in any way it believes appropriate withoutincurring any obligation to you.Information concerning non-IBM products was obtained from the suppliers of those products, their publishedannouncements or other publicly available sources. IBM has not tested those products and cannot confirmthe accuracy of performance, compatibility or any other claims related to non-IBM products. Questions onthe capabilities of non-IBM products should be addressed to the suppliers of those products.This information contains examples of data and reports used in daily business operations. To illustrate themas completely as possible, the examples include the names of individuals, companies, brands, and products.All of these names are fictitious and any similarity to the names and addresses used by an actual businessenterprise is entirely coincidental.COPYRIGHT LICENSE:This information contains sample application programs in source language, which illustrates programmingtechniques on various operating platforms. You may copy, modify, and distribute these sample programs inany form without payment to IBM, for the purposes of developing, using, marketing or distributing applicationprograms conforming to the application programming interface for the operating platform for which thesample programs are written. These examples have not been thoroughly tested under all conditions. IBM,therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,modify, and distribute these sample programs in any form without payment to IBM for the purposes ofdeveloping, using, marketing, or distributing application programs conforming to IBMs applicationprogramming interfaces.© Copyright IBM Corp. 2004. All rights reserved. ix
  • TrademarksThe following terms are trademarks of the International Business Machines Corporation in the United States,other countries, or both: Eserver® DB2® Redbooks (logo) ™ ibm.com® IBM® Redbooks™ z/OS® IMS™ Tivoli Enterprise™ AIX® Lotus® Tivoli Enterprise Console® CICS® NetView® Tivoli® CICSPlex® OMEGAMON® TME® Database 2™ OS/390® WebSphere® Domino® OS/400® DB2 Universal Database™ Rational®The following terms are trademarks of other companies:Java and all Java-based trademarks and logos are trademarks or registered trademarks of SunMicrosystems, Inc. in the United States, other countries, or both.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in theUnited States, other countries, or both.Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, othercountries, or both.UNIX is a registered trademark of The Open Group in the United States and other countries.Linux is a trademark of Linus Torvalds in the United States, other countries, or both.Other company, product, and service names may be trademarks or service marks of othersPeregrine ServiceCenter is a trademark of Peregrine.x Service Level Management
  • Preface Traditional availability management focuses on managing the state of IT resources at a component level, without the context of the required service necessary to support vital business functions. As IT organizations mature and focus more on meeting business objectives, they recognize the value of providing sustained levels of availability. They also improve service quality that is consistent with business objectives and cost constraints. Managing IT costs requires repeatable and measurable processes such as the best practices for service level management (SLM) documented in the IT Infrastructure Library (ITIL). Central to the ITIL best practices are the service management processes. These are subdivided into the core areas of service support (day-to-day operation and support) and service delivery (long-term planning and improvement). This IBM® Redbook takes a top-down approach that starts from the business requirement to improve service management. This includes the need to align IT services with the needs of the business, to improve the quality of the IT services delivered, and to reduce the long-term cost of service provision. It focuses on how clients accomplish this by implementing SLM processes supported by IBM Tivoli Service Level Advisor and IBM Tivoli Business Systems Manager. The approach used in this book leverages Tivoli® and non-Tivoli monitoring sources. IBM Tivoli Monitoring for Transaction Performance, IBM Tivoli Monitoring, and various IBM Tivoli Monitoring PACS, along with Peregrine ServiceCenter, serve as interface points to provide the end-user perspective of service delivery. For IT managers and technical staff who are responsible for providing services to their customers, use this IBM Redbook as a practical guide to SLM with IBM Tivoli products. It takes you from a general outline of SLM to specific implementation examples of banking and trading that incorporate the Tivoli monitoring products. The key elements that are addressed in this redbook are: Organizational considerations for implementing the ITIL processes Identifying which services or business functions will be used for the initial deployment Determining the metrics and monitoring sources required for operational and service level agreements (SLA) definition and evaluation, including business schedules and maintenance periods© Copyright IBM Corp. 2004. All rights reserved. xi
  • Leveraging IBM Tivoli Business Systems Manager for configuration and availability management of services Peregrine ServiceCenter for service desk in a component-level for SLA, as well as managing service incidents in real-time The value of understanding the impact of end-user response time on service delivery Managing end-to-end services that include mainframe and distributed components Improving service delivery with proactive service management using predictive analysis and operational status alerts Providing ongoing executive-level status, and on-demand reporting The next steps for expanding the deployment using the ITIL continuous improvement process approach Overall business value attained through the implementation of these processes and toolsThe team that wrote this redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), Austin Center. Edson Manoel is a software engineer at IBM working in the ITSO, Austin Center, as a Senior IT Specialist in the systems management area. Prior to joining the ITSO, Edson worked in the IBM Software Group, Tivoli Systems, and in IBM Brazil Global Services Organization. He was involved in numerous projects in designing and implementing systems management solutions for IBM Clients and Business Partners. Edson holds a Bachelor of Science degree in applied mathematics from Universidade de Sao Paulo, Brazil. Kimberly Cox is an IBM Certified IT Specialist with IBM Software Services for Tivoli. She joined IBM in 1998. She has six years of field experience and her current area of expertise is the architecture and deployment of IBM Tivoli Business Systems Manager/Distributed. She holds a master degree in computer science and engineering from Pennsylvania State University. Eswara Kosaraju is an advisory software engineer for the IBM Tivoli Software Group in Research Triangle Park, North Carolina. He joined IBM in 1999. He holds a master degree in science and technology in engineering physics from Regional Engineering College, Warangal, India.xii Service Level Management
  • Matt Roseblade is a services consultant with the PAN-EMEA Services for TivoliSoftware based in the United Kingdom (UK). He has worked for IBM for nineyears and has four years of experience in working with IBM Tivoli BusinessSystems Manager on engagements throughout Europe. Prior to working for IBMSoftware Group, Matt worked for IGS SSO leading a team responsible for thesystems management of IBM and outsourced z/OS® systems across EMEA.During his 14 years in IT, Matt has acquired 12 years experience in systemmanagement disciplines on the mainframe.Alex Shafir is an advisory software engineer with the IBM Tivoli Software Groupin Research Triangle Park, North Carolina. He has been working with IBM TivoliBusiness Systems Manager since 1997 and joined IBM in 2000. He has over 30years of IT experience in both technical and management positions. He has beeninvolved in SLM, capacity planning, and performance management since 1984.He holds master degree in electrical engineering from Polytechnical Institute,Riga, Latvia.Venkat Surath is a senior IT specialist, as well as an IBM Certified IT Specialist,and part of IBM Software Services for Tivoli Americas. He holds a master degreein computer science from Illinois Institute of Technology, Chicago. Upongraduation, he joined Communications Products Division, IBM Research TrianglePark, NC in 1983 as a software engineer developing network managementsoftware. In 1997, he joined Tivoli Services North America and provides TivoliBusiness Systems Management services. His areas of expertise include IBMTivoli Business Systems Manager (Distributed) and Tivoli Monitoring forTransaction Performance.Eduardo Tanaka is a software engineer for the IBM Software Group, TivoliDivision in Research Triangle Park, North Carolina. He worked nine years inUNIX® server hardware and software development and management for aBrazilian company. Then, in 1990, he joined IBM where he served as thedevelopment, function and system test team leader for various system andnetwork management products. He holds a degree in electronic engineering fromthe Instituto Tecnologico de Aeronautica in Brazil.Brian Watson is a consulting IT specialist from Tivoli Services, EMEA NorthRegion, IBM Software Group. He has worked for IBM for over three years, hasover 25 years of IT experience in both public and private sectors, and specializesin systems management. He was one of the first people to be ITIL certified in1995, and has successfully completed many large and complex systemsmanagement projects including implementations of IBM Tivoli Business SystemsManager. Preface xiii
  • Front row (left to right): Matt Roseblade, Kimberly Cox, and Venkat Surath; back row: Edson Manoel, Eswara Kosaraju, Eduardo Tanaka, Alex Shafir, and Brian Watson Thanks to the following people for their contributions to this project: Peer van Beljouw Ruth van Ouwerkerk ABN AMRO Bank, Netherlands Budi Darmawan Morten Moeller ITSO, Austin Center Rosalind Radcliffe BSM Integration Architect, IBM Software Group, Raleigh Eduardo Patrocinio Tivoli SWAT Team, IBM Software Group, Raleigh Jayne T. Regan Service Level Advisor Development Manager, IBM Software Group, Raleigh Michael D. Tabron Tivoli Service Level Advisor Interaction Designer, IBM Software Group, Raleigh Joe Belna Shawn Clymer Subhayu Chatterjee TSLA Development team, IBM Software Group, Raleighxiv Service Level Management
  • Gareth Holl TSLA L2 Support, IBM Software Group, Raleigh Tom Odefey TBSM SVT Specialist, IBM Software Group, Raleigh Tony Bhe ITM SVT Specialist, IBM Software Group, Raleigh Jon O. Austin John Irwin Yoichiro Ishii Tivoli Customer Programs, IBM Software Group, RaleighBecome a published author Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. Youll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, youll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Preface xv
  • Comments welcome Your comments are important to us! We want our Redbooks™ to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an email to: redbook@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493xvi Service Level Management
  • Part 1Part 1 Fundamentals This part includes the following chapters: Chapter 1, “Introduction to service level management” on page 3 Chapter 2, “General approach for implementing service level management” on page 23 Chapter 3, “IBM Tivoli products that assist in service level management” on page 53 Chapter 4, “Planning to implement service level management using Tivoli products” on page 109© Copyright IBM Corp. 2004. All rights reserved. 1
  • 2 Service Level Management
  • 1 Chapter 1. Introduction to service level management This chapter introduces service level management (SLM). It also outlines an approach to the management of the business-oriented delivery of IT services that this book details in later chapters. Refer to Appendix A, “Service management and the ITIL” on page 447, for details about the organization and activities of SLM and the contributing IT management disciplines.© Copyright IBM Corp. 2004. All rights reserved. 3
  • 1.1 Service level management overview The goal of maximizing profits drives change as well as innovation. It often involves the use of IT to gain a competitive advantage in selling a company’s products and services. To achieve their goals, business units partner with an IT organization to implement technology projects and thus become IT customers. Accordingly, IT organizations are hired by business units to provide technology services. Therefore, they must meet their requirements for those services. In today’s cost-conscious environment, IT organizations are under pressure to reduce costs even as they must deliver a higher level of service to increasingly well informed users. Why service level management? For this reason, customer perception of the availability and performance of these services drives customer satisfaction. As a service provider, an IT organization must be able to demonstrate and guarantee quality of service to its customers. However, IT management has often struggled to measure delivered services while reconciling such measurements with the perceived quality of this delivery. To solve this problem, IT organizations are deploying SLM that includes contracts between IT and its clients that specify the client expectations, IT’s responsibilities, and the compensation that IT will provide if the goals are not met. The main factors for driving interest to SLM are: Complexity: A dramatic increase in the number of applications, their importance, and demand on IT infrastructure Dissatisfaction: Increasing user sophistication and growing dissatisfaction among users with service that they receive from IT Better technology: More mature technology that can provide end-to-end measurement, reporting, and management at a reasonable cost and offer more simple process What is service level management? SLM is a means for the lines of business (LOB) and IT organization to explicitly set their mutual expectations for the content and extent of IT services. It also allows them to determine in advance the steps to take if these conditions are not met. The concept and application of SLM allows IT organizations to provide a business-oriented, enterprise-wide service by varying the type, cost, and level of service for the individual LOB.4 Service Level Management
  • According to the highly popular, process-based methodology IT InfrastructureLibrary (ITIL), SLM is the process of negotiating, documenting, agreeing andreviewing business service requirements and targets, within service levelrequirements and agreements between service providers and their customers.These relate to the measurement, monitoring, reporting, reviewing, andcontinuous improvement of service quality as delivered by the IT organization tothe business.ITIL’s methodology provides two models for IT activities: service delivery andservice support.Service deliverySLM, along with availability management, capacity management, IT servicecontinuity management, and financial management for IT services, comprisesthe service delivery model. The primary role of this model is to offer a proactiveprocess of planning and management of service according to the plan.Service supportThe service support model includes incident management, problemmanagement, change management, release management, and configurationmanagement. The primary role of this model is to offer operationalimplementation and monitoring of service according to the plan.Figure 1-1 shows how the service delivery and service support models fit in theITIL roadmap for service management. Planning to implement Service Management Service Management The Business Information Perspective Service Delivery Service Support Technology perspective Linking business goals to IT Providing IT Services Providing IT Services cost-effectively support and maintenance Applications Security IT Infrastructure Management Management ManagementFigure 1-1 The ITIL service management roadmap Chapter 1. Introduction to service level management 5
  • According to the ITIL, SLM relates to the other aforementioned disciplines as follows: Supported by availability management, IT service continuity management, capacity management, problem management, and configuration management Provides information to incident management and change management Monitored via financial management for IT services, incident management, capacity management, and availability management Supports application management, business processes, and event management SLM is the disciplined, proactive methodology. Procedures are used to ensure that adequate levels of service are delivered to all IT users in accordance with business priorities and at an acceptable cost. Service levels typically are defined in terms of availability, responsiveness, integrity, and security delivered to users of the service. Pros and cons of service level management Although the duration and scale of SLM implementations may vary, both large and small corporations can capitalize on the benefits of SLM. They do so by choosing the components that are most appropriate to their specific SLM needs. Implementing SLM requires time and effort. It is difficult to rationalize allocation of IT resources to this project if IT is already working with limited resources. In addition, IT clients sometimes abuse the SLM processes, especially when they aim for unreasonable or unattainable service level commitments. However, this should not stop IT management from developing SLM, which can be equally important for both business units and an IT organization. SLM increases the efficiency of an IT organization and introduces a financial incentive and penalty system for service delivery. Indeed, the rising popularity of SLM testifies to its value. For an IT organization, the effective SLM is often a matter of survival particularly if its mission is to operate as a business. The product of an IT organization is the service it delivers to business units. For an IT organization, providing quality services is not enough. The service must consistently be of the same high quality both in actual delivery and in the eyes of the users of the services. SLM supports IT organizations to improve the quality of the services provided and the quality of the services as it is perceived by the users of IT services. Refer to Appendix A, “Service management and the6 Service Level Management
  • ITIL” on page 447, for a definition of quality of services and how it is perceived by users and customers of IT services. Both an IT organization, as a seller, and a business unit, as a buyer, need a contract that clearly defines both the capabilities and limitations of this process. For reasons of customer satisfaction and cost control, the product must meet the specifications of this contract.1.2 Service level management benefits Businesses need to respond quickly to market demands and seek to maximize profits. These goals often result in a high volume of change for IT organizations. Every IT organization has an objective to align its goals with business requirements and to better support business needs. They use SLM to ensure that scarce IT resources are prioritized to focus on key business requirements. By implementing SLM, IT organization can achieve many of their goals. However, they must overcome many challenges to ensure that the SLM program is successful. Goals The goals of SLM are: Understand and meet the requirements of customers and end users Use resources efficiently, effectively, and provide value for money Improve continuously through a process of learning and growth Use internal process to generate added value for customers and survive Establish a business-like relationships between the customer and supplier Challenges The challenges of SLM are: Divergent views of business and IT organizations Diversity of organization business areas Changing the mind set from products and systems to services Perception of IT (historically not always good) Unknown components, dependencies, and ownership Poor quality management information and metrics Unable to justify investment or assess risk No measure of proof of improvement Coping with infrastructure complexity Providing consistent and stable services Chapter 1. Introduction to service level management 7
  • Faced with many constraints, an IT organization wants recognition for providing good services based on component-centric measurement metrics. At the same time, business units feel that they are paying for a service, but cannot perform their work and do not trust IT that always report good service. SLM offers evolution for measuring IT effectiveness by moving from the component-based evaluation of service to service-based management. Figure 1-2 illustrates a situation where the reduction of the downtime of components reported by the IT organization does not improve customer satisfaction because the damage has already been done. It emphasizes the fact that business units and IT organizations have different views of the customer perception on the quality of the services provided. BUSINESS MANAGER IT ME AS UR EM CUSTOMER EN IMPACT TS TS EN EM UR AS Outages S ME ES SI N BU IT COMPONENTS DOWNTIME IT MANAGER Time Figure 1-2 IT and business views often differ When used correctly, SLM helps an IT organization to deploy resources fairly, defend itself from user attacks, and advertise good service.8 Service Level Management
  • How can SLM help IT to deploy resource fairly? Client satisfaction SLM necessitates IT management to initiate a dialog with business units to understand the requirements for service. It also forces business units to clearly state their requirements and expectations. Improved client satisfaction is the main benefit of SLM, which ensures it through negotiated SLAs, established benchmarks for service measurement, and continuing dialog through reporting and reviews. Managing expectations SLM makes it possible to avoid an expectation creep of rising levels of IT clients’ undocumented expectations. Undocumented users’ requirements and expectations levels usually lead to expectations staying ahead of service that is being delivered. SLAs document negotiated requirements and establish expectations. They also serve as brakes when users want higher levels of service than IT committed to deliver. Resource regulations SLM provides a mechanism for governing IT resources. It allows IT to reject demands for resources to applications that unfairly tie up resources, and therefore, regulate workload based on business priorities. SLM helps to avoid capacity problems by providing early warning of SLAs being violated. Additional equipment might be required to support IT commitments. Cost control SLM helps IT to determine, through dialog with users, the level of service required and to determine the acceptable capacity and staffing it needs to provide. SLM can demonstrate that desirable service is not always affordable and can impact costs through moderating user demands for higher levels of service. It allows IT to explain the financial impact of higher levels of service and avoid the unnecessary cost by forcing users to justify the additional cost.SLM helps to change relationships between business units and IT from anegative acceptance of IT as a necessary evil to viewing IT as an asset inexecuting their mission. When the clear service objectives are documented andnegotiated measurement reporting is in place, IT has the means to manage itsresources as well as user dissatisfaction.BenefitsIn summary, the benefits of SLM are: IT service designed to meet agreed requirements Clearly defined roles (activities, responsibilities, and authority) Measurable, realistic SLAs for improved customer and supplier relationships Balances service requirements against the costs Chapter 1. Introduction to service level management 9
  • Reduces risk of unpredictable demand and capacity problems Helps identify service weaknesses Allows underpinning of supplier management Provides basis for charging and measuring value Establishes an improvement baseline1.3 Service level management components To create and maintain SLM, IT managers need well defined processes, proven tools, a dedicated effort, and a business wide commitment. SLM shifts IT management perspective away from technology and toward the demands of the business and user experiences. It introduces new methods and procedures as well as makes enhancements to the old ones. SLM focuses on the management of an IT service in support of a specific business process. An IT service includes applications and infrastructure resources used by this business process. Management includes planning, monitoring, and reporting. SLM uses SLAs to identify service and determine its management criteria. SLM is a process that is supported by several other processes, including performance and availability management. Both performance and availability management processes are essential for monitoring SLAs. However, an understanding of end-user perspectives through synthetic transactions and communications with users is also critical. Accordingly, monitoring of performance and availability must be adjusted to account for user experiences. For this reason, IT operations must incorporate end-user experiences and business function knowledge into the management IT infrastructure and applications. In addition, IT support must incorporate business requirements into the asset management, change management, and incident management. The following sections introduce four SLM components that are essential for implementing a successful SLM program. Processes Documentation People Tools10 Service Level Management
  • 1.3.1 Processes The functions in SLM can be divided as follows: Identify users’ expectations and define parameters for service. Ideally, IT must identify all of the business processes that must be managed. In practice, it is acceptable to select the critical business processes during the first stages of the SLM process implementation and then incorporate additional business processes as the SLM process mature. The IT organization can work with business owners to pinpoint the elements of these business processes. They can define service parameters such as end-user expectations of service, participating IT application and infrastructure components, and metrics for measuring service levels. Assess service capabilities and negotiate service agreements. First an IT organization must have a clear understanding of service expectations, composition of service elements, and service level measurement metrics. Then it must collect data and assess its current capabilities for meeting a customer’s expectation of service levels. After studying current capabilities for delivering all services required and indentifying opportunities for improvement, IT management is ready to talk with customers about the service levels that it can provide. IT should avoid technical terminology and describe services and expectations in a manner that is understandable to its customers. At the same time, IT should fully understand what service levels it can deliver and achieve agreement from its customers on service levels measurement and reporting criteria. IT must document negotiated expectations and measurements metrics as well as agreed upon acceptable service levels values. Manage to meet service level objectives (SLOs). IT must align its processes to proactively monitor, measure, and manage against negotiated SLAs. Accordingly, IT must develop SLOs to meet SLA obligations for underlying IT components, measure actual values against SLOs, and associate the measured status against the SLAs. Upon recognition of service level degradation (preferably through real-time alerts), IT can immediately start finding a problem and restoring service to acceptable levels as defined by SLAs. If the problem is serious, IT may also notify users so they can avoid affected services and calls to the help desk. SLAs that relate to IT operations and support (OLAs) recognize component issues quickly and evaluate their measurements prior to their impact on SLAs and IT customers. IT must come up with monitoring processes, measurement metrics, and automation that allow prompt responses to problems by technical staff in addition to reporting an OLA’s status to management. Chapter 1. Introduction to service level management 11
  • SLM uses reporting to communicate overall service level performance to IT and business management. Effective reporting should show IT performance against service-level commitments (successes and failures). It can be used together with financial incentives to improve IT processes and users behavior. Continue service refinement and improvement. The SLM process should always be examined for process effectiveness, service changes, and reporting accuracy. Customer expectations change as business processes grow and new applications and users are added. As monitoring technology improves, IT can expand metrics that measure component performance and customer satisfaction. IT must periodically re-evaluate the services it provides. Service improvement is a continuous process that allows IT to add more value, adjust to new realities, justify new technology, and often derive more revenue. The same can be said about the SLM process that needs continuous improvement to gain the trust of business owners, improve efficiency through automation, and effectiveness through a better understanding of business-to-IT relationships. Figure 1-3 illustrates the SLM functions. Negotiate SLAs Manage and monitor Define parameters for SLOs services Service refinement and improvement Figure 1-3 SLM process12 Service Level Management
  • 1.3.2 Documentation Because SLM relies on several parties involved in defining the processes, negotiations, penalties, and so on, documentation is a must. The following documents support SLM: Service level agreements An SLA is an agreement between business units (the customer) and IT organization (the service provider). It describes the service and service level measurement metrics, defines the approval and reporting process, and identifies the primary users. It can also include financial terms and conditions. SLAs provide a mechanism for establishing accountability for both IT and their customers for the provided service levels which are negotiated and agreed to based upon business requirements, priority, and cost. SLA measurements must be directly aligned with customer expectations. SLAs are the basis for service level evaluation and improvement processes that include periodic reviews and adjustments if needed. Operational level agreements An operational level agreement (OLA) is an internal agreement that shout be established between all business and IT groups prior to the execution of an SLA. The OLA establishes specific requirements that each IT group needs to meet in support of service levels and make them accountable for their contribution to the overall improvement of service levels. Well-defined OLAs show IT management which areas have more impact on service levels, where to focus attention and financial rewards, and how each group can contribute if business requirements require a change of SLAs. Underpinning contract IT should establish underpinning contracts (UCs) for any service provided by external service providers and vendors. UCs add accountability for external component of service levels in the same way as OLAs account for the internal components of service levels. IT can use the contractual agreements that they have with their third-party vendors and feed the pertinent data into the SLM process. As service levels need to be changed, IT may need to re-negotiate external contracts with vendors and modify the UCs. Figure 1-4 illustrates the flow of customer, internal, and external contracts. Service catalog The service catalog provides a place to document all services provided to the customers and to record such details as key features, components, charges, and dependencies for each service. Chapter 1. Introduction to service level management 13
  • Customers SLA SLA IT Services Provider Service 1 Service 2 IT Infrastructure Underpinning OLA Contracts Internal organization External organizations Figure 1-4 SLM customer, internal, and external contracts Service level objectives SLOs define service levels that have been agreed to by parties that negotiated SLAs which need to be monitored and reported. They include one or more service level indicators (SLIs) presented in the business context. The SLO defines the component of service and how it is being measured. SLIs determine measurement metrics for SLM quantification. SLIs should reflect user perspective such as pain points and priorities, service availability, and responsiveness. For example, the most common SLOs are availability and performance. A service availability SLO may include the SLI measured in the percentage of time that the service was in available state. A performance SLO may include two SLIs: service responsiveness (response time) and completed work (number of transactions). An IT organization must use monitoring for measuring the actual results of SLIs and reporting for communicating these results to business and IT managers. The format, details, and period vary depending on the recipients of reports. SLM can also include real-time information, alerting IT when results approach or breach service levels are guaranteed by SLAs.14 Service Level Management
  • Service improvement program SLM is a continuous process that includes service level improvement and SLM improvement activities. IT should never be satisfied with current level of service even if it satisfies its obligations to customers. IT should develop a service improvement program and document a service quality plan. This plan should include how to maintain awareness of changing business objectives, cost-effectively add new technology, improve daily operations, and expand SLIs and reporting to match user perception of service as much as possible.1.3.3 People The SLM process requires the involvement of people at various levels within business and IT organizations. The request for service improvements often starts with the head of a business unit or a senior executive who begins demanding more consistent service and accountability from IT. IT management may respond with tactical improvements but may be forced to implement the SLM program. SLM is a collaborative effort. Its implementation includes a number of people in dedicated or supporting roles. Responsibility for overall management of the SLM program is most likely to be assigned to a senior IT executive. IT may also assign a dedicated project manager and a dedicated service level manager. The project manager is responsible for implementing the SLM project. A service level manager is active throughout the entire implementation phase as well as after the phase. This person also coordinates ongoing management and improvement programs. In their effort, both the project manager and the service level manager need support from line managers of IT and business groups. The SLM team must include representatives from both business units and IT service delivery and may require some assistance from consultants. However, SLM is primarily an IT effort as it is IT who must handle the technical aspects of the SLM implementation, deployment, and operation. The SLM program must have an executive sponsor who provides funding for the program and is ultimately responsible for the success of the SLM program. For more details about the roles and responsibilities of the people involved in implementing SLM, see 2.2.1, “Identifying roles and responsibilities” on page 26.1.3.4 Tools While developing the SLM plan, the IT organization must choose tools to enable the SLM process that is being developed. Depending on the selected measurement metrics and the service composition of related IT resources, these Chapter 1. Introduction to service level management 15
  • tools support monitoring of the chosen service indicators and user experiences. They also provide analytical capabilities and aggregation for reporting. In addition, IT must organize the collected data and make it accessible to everybody with a stake in the SLM process. Analytics and reporting must present this data in a manner that aligns the service views of both IT and their customers, allowing them to reconcile the customers’ perception of service with the service levels delivered by IT. IT wants to understand how resource performance and availability affects service levels and what adjustments are needed to improve service. Customers want to make sure that IT delivers availability and responsiveness to the critical applications that they use for automating their business processes. When their business process is impacted, they want IT to accurately report it so they can impose the negotiated penalties on IT. SLM is a hot topic, and many companies have made claims that their products provide SLM solutions. Some products are specifically designed for SLM. Others offer only aspects of monitoring capabilities but still market their products as SLM solutions. When implementing SLM, IT should choose the following tools to meet their design specifications: Monitoring tools to provide the measurement metrics they need to collect Reporting tools that process the data being captured and satisfy all levels of report recipients Analytical tools that provide aggregation and analysis of the collected SLM data in a manner that offers fast recognition of business impact and proactive response Administration tools that improve the productivity of SLM operators and users as well as provide the integration of monitoring, reporting, and analytical tools This book introduces solutions provided by IBM, which include a wide range of products that can monitor a variety of distributed and mainframe servers, databases, transactions, networks, Web servers and end-user experiences. In addition, IBM offers analytical products in SLM space that provide the real-time integrated event console, event correlation, business service management (BSM), and proactive SLM. All these products accept data from the majority of today’s monitoring products.16 Service Level Management
  • 1.4 Business service management approach to servicelevel management The philosophy of managing services in a business context is receiving more traction with IT organizations that are trying to improve relations with their customers. These same organization are also trying to overcome historical challenges such as customer perception and the increasing complexity of technology. Understanding how shared infrastructure resources are being used by business processes significantly improves the ability of business and IT executives to negotiate, measure, and evaluate service contracts. Many IT organizations are turning to BSM solutions to facilitate a business-defined view of IT-delivered services. BSM solutions provide facilities and analytics that enable IT to manage service levels with the business consumer for a specific business process to ensure that the SLA associated with this process is fulfilled. Why business service management? Earlier this chapter introduced SLM as the management of IT resources to deliver the required service at the required level of quality. BSM allows IT to incorporate business knowledge into the service management process and to translate data from traditional infrastructure and application management tools into business-level representations. BSM relies on IT organizations that work with business units to map resource-to-service relationships and organize them into structures that depict and visualize the components of IT infrastructure as well as automate components of the business process based on the knowledge of their relationships. Accordingly, with BSM, IT management and business executives can reconcile their perspective of IT performance. This is because BSM can report both real-time status and historical service-level compliance for each business function supported by IT. What is business service management? BSM is a service management application that aligns IT operations with business processes. Therefore, it allows business functions to receive maximum leverage from IT resource management. BSM solutions enable real-time management of events and service levels based on knowledge of their relationships to an IT service provided to a business entity responsible for a business process. BSM provides IT with a set of algorithms and visualizations that IT must incorporate in its SLM processes. It is designed to display and report the service Chapter 1. Introduction to service level management 17
  • delivery health and business impact of IT based on performance and availability of IT resources. The visualization of BSM runs on federated event and monitoring data as well as business and IT relationship data. The four aspects of BSM are: It consists of identifying the components of a business system. It involves measuring the performance and availability of those components. It ensures that the components are performing within SLOs. It alerts to any deviation or potential deviation from SLOs. The concepts behind BSM include: Resources are components of IT infrastructure. Business transaction is a group of IT resources supporting a particular IT workload. Business system is a group of resources that supports a business goal. Business process is composed of some automated (IT services based) and some manual steps. When policy data or service level information is attached to a business system, it turns into an IT service. IT service can be perceived as a collection of IT resources that make up the automated part of the business process.1.4.1 Convergence of business service management and service levelmanagement With BSM, an IT organization gains insight into a business process. It can use this insight to design SLM based on the aforementioned relationship structures that we call business systems. A business system is a representation of a group of diverse but interdependent enterprise resources that are used to deliver specific business functionality. Business systems allow flexible and automated arrangements of IT resources into models of services that IT provides to automate business functions. Together, they represent what we call the Business/IT knowledge base that is an important element of the SLM methodology. As a result of a joint effort to develop the Business/IT knowledge base, an IT organization and business units have a framework for SLA that allows them to: Identify all components of a service Create SLA and OLA contracts based on business systems Measure resource performance and availability by business systems18 Service Level Management
  • Get service violation and trend alerts for any deviation or potential deviation from the SLO Ensure that services are performing within the SLO The Business/IT knowledge base provides the foundation for BSM and SLAs. In reality, BSM allows IT to decompose business processes into IT systems and document the negotiated service levels in SLAs to be managed by BSM via monitoring and analytics organized by business systems. BSM accepts data from a variety of performance and event data sources that monitor IT resources. The BSM analystics then consume this data to determine business systems status and understand its business impact. Figure 1-5 demonstrates that business systems are a cornerstone for establishing service levels and managing IT resources based on business objectives for IT services. Underpinning Historical SLA OLA Contracts Reporting Service Level Management Service Business IT Services Business Services Business Systems Business Systems The Systems - databases The Business - banking - web servers Technology - trading - banking application - e-commerce - application support Service - development Service Business Business Business Systems Systems Systems Business Systems Management Incident Contextual Real time resolution Business views alerting monitoring prioritizationFigure 1-5 Business system organizes IT resources and other business systems A successful SLM program that aims to solve user perception issues should establish a common understanding between business units and an IT organization on service delivery and quality of service measurements. As outlined earlier, the BSM approach to SLM helps this effort by collecting business knowledge and exposing the use of resources by services. This makes SLA contracts and measurement metrics more meaningful to both IT and business units. Chapter 1. Introduction to service level management 19
  • 1.5 Improving service level management throughintegration SLM is the continuous process of measuring, reporting, and improving the quality of agreed upon service that an IT organization provides to the business. This requires that an IT organization clearly understands each service it provides, its business importance and priority, who consumes this service and how, and the IT resources are used. Such information is usually dispersed and requires a significant effort from IT to obtain and organize it a meaningful way that can expose business use IT resources. As demonstrated earlier, you can use BSM to compose and refine services from related resource and business systems objects. Service compositions defined by BSM allow IT to design SLAs and service level measurement criteria in an integrated manner and provide: Improved effectiveness of SLAs When a IT organization uses the same definitions of services for aggregating monitored data, service management, and service evaluation, it can significantly improve the effectiveness of SLAs and make investigations of SLA violations more productive. Improved effectiveness of communication Through a set of federated monitoring data and views, IT can use service compositions to effectively communicate with users (while developing and reporting SLAs) and to prioritize management of incidents. Figure 1-6 presents a high-level view of integrating monitoring, service management, and service evaluation around service compositions. Management of IT resources within the context of the business services they provide includes: Automatic discovery of IT resources and their relationships Automation for constructing services and business systems Detections of incidents for IT resources in a service context Determination of service status and business impact of incidents Warehousing of historical data for IT resources and services Service level evaluation and alerting in service context Reporting service health and service level compliance with SLAs20 Service Level Management
  • Business Service Service Level Management Management - Business Systems - SLA - Services - OLA - Contracts Service Management Service Evaluation Measurement Metrics Business/IT Knowledge Base Monitoring Service Service Composition Delivery Business Business Applications Infrastructure Process Knowledge The Business Information Technology RequirementsFigure 1-6 Using business knowledge for managing IT servicesLarge enterprise IT environments deploy many system management products tooperate their diverse resources. It is difficult to integrate data from such a varietyof data sources into the SLM process. BSM solutions meet this challenge byaccepting data from all major monitoring vendors. BSM then integrates this databy supplying business analytics and automation that allow IT to define andmanage services throughout the life cycle of SLM.Armed with business knowledge and negotiated service composition andmeasurement metrics, an IT organization can design its business systemmanagement, SLM, and monitoring processes to measure quality of service thatcorrelates with user perception. To improve acceptance, IT must continue to Chapter 1. Introduction to service level management 21
  • refine the service composition and measurement metrics until they become transparent to business units.1.6 Scope of this book As outlined in this chapter, there are many aspects to SLM. One of the main objectives is to relate the definition of service to the perception of IT users and business unit management. The quality of services delivered to these users is judged according to users’ ability to use services effectively and cost-efficiently when required by their job functions. Although IT managers place a high priority on meeting this objective, the task of reporting on quality of service that users accept as matching their experiences is often hit and miss. The BSM approach (outlined earlier in this chapter) to SLM offers significant improvements in this area by making business to IT relationships more factual and transparent through several implementation steps. The topics in this book are structured to guide you through analysis of SLM and its planning aspects to detail implementation of BSM, SLM, and monitoring integration approach using Tivoli products. They include a summary of improvement opportunities for each topic. The remainder of this book is divided into the following chapters: Chapter 2, “General approach for implementing service level management” on page 23, describes a generic approach for SLM implementation, following the ITIL process improvement model as close as possible. Chapter 3, “IBM Tivoli products that assist in service level management” on page 53, provides an overview of the IBM Tivoli products that support SLM processes. Chapter 4, “Planning to implement service level management using Tivoli products” on page 109, outlines the planning and implementation of SLM and BSM through the integration of several IBM Tivoli products. Chapter 5, “Case study scenario: IRBTrade Company” on page 197, provides a test case of the SLM program implemented to manage the distributed environment for a trading company. Chapter 6, “Case study scenario: Greebas Bank” on page 315, provides a test case of the SLM implementation of enterprise management (mainframe and distributed) for a bank. Appendix A, “Service management and the ITIL” on page 447, discusses the various components and definitions behind Service Management in ITIL terms. It is designed as a reference for Anyone involved in the SLM process.22 Service Level Management
  • 2 Chapter 2. General approach for implementing service level management Service level management (SLM) is an important initiative. It requires the participation and support of many resources. A successful implementation has an established business need, commitment from all those involved, and funding to ensure adequate resources and tools for completion. It requires a strategy and a flexible plan for negotiating, implementing, and maintaining service level agreements (SLAs). The typical motivation for SLM is the need to improve IT service delivery as perceived by customers. In many cases, the team responsible for IT service delivery does not have all the information required to meet the needs of the business. As a result, IT delivers and reports on top quality service, while business units experience service that is perceived to be of a low quality. SLM provides a means to overcome this challenge, providing the many benefits described in 1.2, “Service level management benefits” on page 7. Executive management commitment for SLM is essential since the goal of aligning IT and business requires an organization-wide commitment from both business and IT representatives. It takes hard work and discipline to implement SLM. Simply providing funding is not enough. Executive management can© Copyright IBM Corp. 2004. All rights reserved. 23
  • facilitate commitment during the entire SLM planning and implementation cycle by continually motivating the change and leading by example. This chapter describes a generic approach (Figure 2-1) for implementing SLM after a decision to do so is established. This methodology starts with a planning phase, continues on to implementation, and concludes with on going management and improvement of the overall process. It follows the IT Infrastructure Library (ITIL) process improvement model. Planning Implementation Established decision to implement SLM Develop service level objectives - Describe services - Determine service level indicators - Determine metrics to be used Define key players: Negotiate on service level agreements - Project Sponsor - Review SLOs with business owners - Service Level Manager - Agree on metrics to be used - Project Manager - Agree on reporting requirements - Business Representatives - IT Representatives Implement SLM management tools - Implementing additional monitoring capabilities - Enhance existing monitoring tools if required - Integrate data collected by monitoring - Implement Business Service management tools Understand the services: - Automate service management - Define services - Establish initial perception of the services - Define expected quality of services Establish reporting function - Periodicity - Recipients - Formats Assess ability to deliver: - Analyze existing infrastructure Adjust IT processes to include SLM - Verify existing monitoring capabilities - Service Support processes - Establish baseline for measurement - Service Delivery processes Improvement Process On Going SLM program Improving quality of service levels Maintenance of services definitions Improving efficiency of SLM SLA management via historical reporting Improving effectiveness of SLM Priority management of real-time faultsFigure 2-1 SLM processes implementation approach24 Service Level Management
  • Chapter 1, “Introduction to service level management” on page 3, introduces the four key components of SLM: people, processes, documentation and tools. This chapter identifies and discusses each of these components in more detail.2.1 A look at the ITIL process improvement model An organization may already have some elements of SLM established and operational. Therefore, the approach taken in this chapter to present a method for SLM implementation is one of process improvement. This chapter applies the ITIL process improvement model to an SLM implementation. ITIL process improvement model is summarized by asking the following questions in the order presented: 1. Where do we want to be? This question provides the vision and objectives for an SLM implementation. It is answered by having a clear definition of provided services, determining the current perception of quality of the services being provided, and defining the desired quality of the services to be provided to customers. These topics are addressed in 2.2, “Planning for service level management implementation” on page 26. 2. Where are we now? Perform a thorough assessment of the existing IT infrastructure’s ability to deliver the defined services, and its existing monitoring capabilities. After this task is completed, perform a gap analysis of both the IT infrastructure and the monitoring capabilities so that IT can deliver services with the expected level of quality required by the business and expected by the customers. These topics are also addressed in 2.2, “Planning for service level management implementation” on page 26. 3. How do we get where we want to be? Based on the information gathered from the previous two questions, an IT organization prepares service level objectives (SLOs), constructs SLAs, and negotiates them with customers. This is also the time when additional IT infrastructure, monitoring tools, or both should be put in place. Most importantly, adjustments to existing IT processes to accommodate SLM are performed. These topics are addressed in 2.3, “Implementing service level management” on page 35. 4. How do we know we have arrived? When the implementation is complete, hold review sessions to ensure that all specified goals were met. Also discuss how to resolve unmet goals. Establish quality management for IT services and SLM process improvement programs Chapter 2. General approach for implementing service level management 25
  • at this time. These topics are also addressed in 2.3, “Implementing service level management” on page 35.2.2 Planning for service level managementimplementation This section describes the planning activities that lead to a successful SLM implementation. The desired output items of this phase are: A carefully chosen team capable and committed to implementing SLM This team should include the project manager and service level manager roles to keep deployment participants on track and communicating regularly. A thorough understanding of the services to be managed To accomplish this, collect information from both the business and technical perspectives and then have the service level manager mediate it. Business owners provide an overview of the major functions and an understanding of user demand. The IT service delivery organization provides detailed information about the components that make up the services that support the business functions. Identify current perception of the quality of the identified services and the desired quality level of those services. An assessment of the ability to deliver services based on the expected level of quality This includes an understanding of the current capabilities of the IT infrastructure to deliver services to the quality expected by the business owners. Consider users’ current perception of service levels in this assessment. Based on this assessment, improvements to the IT infrastructure may be required. Define a high-level design that provides an assessment of the existing monitoring capabilities and additional monitoring tools and processes at this time. This forms a baseline for measurement of expected quality of services. To some, all of this preparation may seem time consuming. However, it leads to clearer objectives, which in turn, contributes to project success.2.2.1 Identifying roles and responsibilities SLM requires the participation and support of many different organizations of a business. It is important to clearly define the roles and responsibilities of the people involved and to then identify the specific people to take on these roles. It is also important to involve all team members from the start of the project and to26 Service Level Management
  • facilitate regular deployment checkpoint meetings. This ensures that everyonehas a consistent level of information throughout the deployment.Choosing the correct people is critical. Whoever is chosen must represent theviews of the decision makers from both IT and business organizations and havethe final word on the SLM implementation plan.The SLM deployment team should include people from the areas shown inFigure 2-2. Business Representatives Executive Project Service Level Manager Sponsor Manager IT RepresentativesFigure 2-2 Key representation in an SLM deploymentThe following sections summarize the responsibilities for the key participants.Executive sponsorThe executive sponsor is typically the head of the line of business and isresponsible for delivery of business services to end users. This personunderstands the overall picture of the business process and can state thepurpose of the business. This person has the ultimate go or no-go authority forthe project and the final arbiter for problems and disagreements.Project managerImplementation of SLM is a large scale project and should be treated as one.Appoint a qualified, full-time project manager to work closely with the servicelevel manager and other people involved in the project to incorporate the SLMactivities into a project plan. Chapter 2. General approach for implementing service level management 27
  • Service level manager This is an important role and has the primary responsibility of project ownership. When an SLM project is owned by a service level manager, it is more likely to be effective and successfully produce the benefits that were intended. This person acts as a liaison between the business and IT units, ensuring that IT understands the business requirements and that the business units clearly state them. As such, the person or persons fulfilling this role must have either the appropriate seniority within the organization, or have clear, visible support from upper management from both IT and business organizations. Additional responsibilities for the service level manager include: Creating and owning the SLM people structure within the organization Presenting the plan for SLM to all of the groups involved Describing how SLM will impact each group Describing how each group can contribute to a successful implementation This includes the risks and costs involved. The more complex the plan is, the higher the cost is (more servers, more people hours). Asking each group for support, involvement, and agreement Establishing a regular service level review process with both the customer and the IT provider Negotiating and maintaining the SLAs with the customer Negotiating and maintaining the OLAs with the IT provider Analyzing and reviewing service performance regularly against SLAs and OLAs, leading to adjustments as appropriate Creating and disseminating regular reports on service performance and achievement Coordinating temporary changes to required service levels Business representatives The primary responsibility for this role is to explain the overall and component-wise picture of the business. Business services may include a number of services that require IT support. Therefore, performance of business owners depends on IT performance. Business owners understand their service well but may not understand what comprises an IT service. In large environments, this can be several people, one for each operational unit. A secondary responsibility for this role is to keep the SLM implementation business-oriented.28 Service Level Management
  • IT representatives There are many responsibilities for this role, and they are typically fulfilled by more than one person. The responsibilities include: Providing systems management information such as hardware and operating systems, network infrastructure, application monitoring tools, and so on Describing the IT components of the business service Providing information about the day-to-day operation of the business components Providing feedback from customers to the overall SLM implementation process This is typically the service desk or customer support group with a primary line of communication to the service users. Providing the business impact of problem and change management Taking on the role of technical lead for the tools used in an SLM implementation This group should have or be ready to learn the skills required to deploy the actual tools to be used, as described in 2.3.3, “Implementing service level management tools” on page 38.2.2.2 Understanding the services The purpose of the activities described in this section is to improve the delivery of services to customers. You cannot do this without a clear understanding of what customers want and what they are getting now. This section establishes a high-level definition of the requirements. When understanding the service, the people identified in 2.2.1, “Identifying roles and responsibilities” on page 26, should participate in the activities described in this section. Most of the information comes from the business representatives, who understand what needs to be provided in terms of services to meet the needs of the customers. The information also comes from the IT representatives, who understand what it takes in terms of IT resources to support the business processes. The business representatives provide the functions of the services. The IT representatives provide information about the underlying IT components of the service. The service level manager, who understands both business and technical aspects, is an important participant as well. One way to obtain the required information is to arrange interviews with the right people, to feed back what was said, and check that you understand it correctly before moving on to the next stage. Another way to obtain the information is to Chapter 2. General approach for implementing service level management 29
  • have moderated discussions with multiple people so that information and expectations can be level set among the business and IT participants. Defining services For the purpose of this redbook, a service is defined as a logical grouping of IT systems and applications that together deliver one or more functions to one or more users. From the IT perspective, it is a set of applications that serve a specific business objective with each application comprising of components made of IT resources. From the business perspective, a service is the mapping of IT resources to business processes. According to the ITIL, a service is the IT system or systems that enable customers and users to implement business processes. For more information about the ITIL definition, see the SLM chapter in the ITIL Service Delivery book. This chapter also introduces and encourages the use of a service catalog. Note: It is possible for a service to be made up of other services. For example, online banking can be a service that is made up of services for checking balances, depositing funds, withdrawing funds, and so on. A high-level example definition of a service is as simple as this: My service is online banking. My service is a travel reservation system. My service is a payroll system. To complete the definition of the service, you must now have an understanding of the underlying IT components that make up the service. Typically, a component represents a machine or an application with multiple event sources mapping to it. It is important to know what applications make up the components and how these applications relate to other applications, including dependencies. The following list provides suggestions to assist in defining the business service: Business information – List the functions provided by the service. You may have to speak about applications if the concept of service is unfamiliar. – Describe the relationships between the functions. Provide a schematic that describes how each function is integrated to create the service. The schematic may include a business flow diagram. Technical information – Name the applications or components that deliver the service. – State the purpose of each application or component.30 Service Level Management
  • – Describe the relationships between the applications or components. Provide a schematic that describes how each application is integrated to create the service. The schematic may include a data flow diagram. The relationships may also be described in an architecture document.Table 2-1 provides a useful template for keeping track of components andrelationships between components.Table 2-1 Business service component relationships Business Depends on Impact Comment component examples Application Operating system Application A This application provides server network availability <...> to the business service. Operating Hardware Applications The operating system is system server availability running on an the platform for operating system applications A, B, and C. Network device None VariousEstablishing an initial perception of serviceWhen an SLM process is in place and services that will participate in the processare identified, establish an initial perception of quality of those services and use itas a starting point for improvement through SLM. There are two sides to theperception of services. One side comes from the business owners and is definedin business terms as opposed to technical perception. The other side comesfrom IT service delivery and is likely to be in more technical terms.From the business perspective, examples of initial perception of service may be: The Web site is rarely available in the evenings. Response time is unacceptable. We are losing customers due to bad service.From the IT perspective, the perception of service may be: Servers are available 98% of the time. CPU utilization is at acceptable levels. Existing systems management tools are being under used.As shown in this example, both perceptions are credible to the organization, yetdistinct to each other. Record these perceptions, so that when implementationbegins, you can reference them and choose appropriate metrics formeasurement. Chapter 2. General approach for implementing service level management 31
  • The following list provides suggestions to assist in establishing the initial perception of service: Usage information – Number of users of the service – If applicable, a breakdown of function usage by company employees, business partners, the general public, etc. – Patterns or hours of usage, including peak times – How users access the service (Internet, intranet, extranet, legacy 3270 screens, etc.) The deficient and favorable points of current IT service delivery and how they are communicated to the IT organization The challenges faced by the business, including what is on the horizon by way of new or updated services Current issues with the business service functions Table 2-2 provides a useful template for keeping track of usage information. Table 2-2 Business service usage and perception Feature Time of day Number Method of access Perception of users or type of user TransactionA Morning <num> Intranet Good TransactionB Noon <num> Internet Slow TransactionC Evening <num> <method> Poor TransactionD Midnight <num> <method> Excellent Establishing the expected and desired quality of service At this stage of the planning phase of SLM implementation, the business owners may define the expectation of quality of the services to be provided to customers and users. Expectations to the quality of services can be motivated by several points, for example: Retain the existing customer base and attract new customers. Cultivate customer loyalty. Prove superior service against competition. Expected quality of service also has an IT perspective, which is likely to be: Align the IT organization with the business views. Increase visibility of improvements being done. Maximize potential of systems management tools.32 Service Level Management
  • Record these expectations, so that you can address them during the assessment phase. Depending on the expectations to the quality of services, you can expect changes and improvements to the existing IT infrastructure. Define the desired quality of services objectives that make sense, are measurable, and are achievable. This helps to define the success criteria of the entire SLM implementation.2.2.3 Assessing the ability to deliver After you understand the service, assess the current operational environment by examining the IT infrastructure, and the existing and planned monitoring capabilities. This brings everyone to the same page and establishes a baseline for measurement. When this is completed, you may begin the implementation. While information is collected, keep in mind the initial perception of service and the expected quality of service. The goal is to understand the components that provide the business service. It is also to understand the current IT infrastructure’s capabilities to deliver the services to the expected and desired quality. IT components are at a granular level and should be described in terms of specific applications, servers, and hardware. Management of the service is in terms of monitoring tools and can include specific monitoring thresholds. Earlier this book described the business functions that made up the business service. This section breaks down these functions to help you understand how the IT resources affect them. It looks into the specific applications that are used to provide the function. It also looks at the network, hardware, and operating systems that run the applications. Analyzing the existing infrastructure Insufficient capacity of the IT infrastructure to deliver services often leads to bottlenecks, performance problems, and, loss of availability, all of which contribute to degrading service delivery. Business components were identified in 2.2.2, “Understanding the services” on page 29. Now you must map these business components to IT components and verify the monitoring environment. Since several IT components make up the service, the capacity of each component must be balanced to the capacity of the other components. Capacity management processes must be in place to have a precise evaluation of the capabilities of the IT infrastructure. This is a crucial step toward negotiating SLAs. SLM processes require the assessment of the IT infrastructure capacity needs to accommodate the customer requirements that will be recorded in SLAs. After SLAs are negotiated, SLM processes set the targets for the IT infrastructure to deliver, and capacity Chapter 2. General approach for implementing service level management 33
  • management processes can report on the performance and throughput achievements for SLA evaluation. Assessing the existing monitoring capabilities Review existing monitoring capabilities and upgrade them as necessary. Ideally you must do this ahead of, or in parallel with, the drafting of SLAs, so that monitoring can be in place to assist with the validation of proposed targets. It is essential that monitoring matches the customer’s true perception of the service. Unfortunately this is often difficult to achieve. For example, monitoring individual IT resources, such as a server, does not guarantee that the service will be available to the customer. Without monitoring all IT resources in the end-to-end service, you cannot see a true picture. Monitoring tools collect information about IT resources using predefined measurement metrics. Metrics are the standard of measurement or a measurable quantity, associated with guaranteed service levels to create SLOs. Metrics evaluate performance, availability, or utilization of IT resources, such as transaction response time, CPU, and disk utilization. When implementing SLM, IT should choose the following tools to meet their design specifications: Identify measurement metrics required to measure the IT resources that make up the services. Use monitoring tools to provide the measurement metrics that need to be collected. Use reporting tools that process the data being captured and satisfy all levels of report recipients. Use analytical tools that provide aggregation and analysis of the collected SLM data in a manner that offers fast recognition of business impact and proactive response. Use administration tools that improve the productivity of the SLM operators and users as well as provide the integration of monitoring, reporting, and analytical tools. Compare this list to the existing system management and monitoring tools already in place in the IT infrastructure. In addition, organize the monitoring data collected by such tools and make it accessible to everybody with a stake in the SLM process. Analytics and reporting tools must be able to present this data in a manner that aligns the service views of both IT and their customers, allowing them to reconcile the customers’ perception of service with the service levels delivered by IT.34 Service Level Management
  • IT wants to understand how resource performance and availability affects service levels and what adjustments are needed to improve service. Customers want to make sure that IT delivers availability and responsiveness to the critical applications that they use for automating their business processes. When their business process is impacted, they want IT to accurately report it so they can impose the negotiated penalties on IT. Define a high-level design that provides an assessment of the existing monitoring capabilities as well as additional monitoring tools and processes. This forms a baseline for measurement of expected quality of services. Important: Do not include anything in an SLA unless you can effectively monitor and measure it at a commonly agreed point.2.3 Implementing service level management A successful implementation of the SLM strategy relies on the ongoing communication between an IT organization and business units. SLAs provide business representatives and the IT department with a common language to discuss goals, responsibilities, and management issues relating to IT services. The planning stage produces a high-level design of the proposed SLM solution. It is based on an understanding of user demands and an IT assessment of feasibility to meet customers’ requirements for services. As a result, the implementation stage begins with the detailed design for this solution that defines the SLOs and outlines the solution deployment plan. Based on this high-level design, an IT organization prepares SLOs, constructs SLAs, and negotiates them with users. At the same time, the IT organization begins the implementation of additional tools and makes adjustments to IT processes as required to support new functions.2.3.1 Developing service level objectives An IT organization manages service levels based upon objectives outlined by SLAs. IT drafts SLOs based on business requirements and an IT organization’s assessment of its capabilities. Then it seeks approval from its customers through negotiation. The starting point for SLAs is the business stating what IT services they need for the business to operate effectively. This may include both the minimum acceptable levels and the desirable levels. The IT department has to assess its capabilities to deliver at this level and negotiate with the customers. Chapter 2. General approach for implementing service level management 35
  • Achieving, or even approaching, the desirable level may require additional investment and may need to be addressed by a service improvement program. The negotiation stage is likely to be iterative. SLOs are specifications of a metric that is associated with a guaranteed level of service that is defined in an SLA. The metric by which SLOs are defined, are often called service level indicators (SLIs). From a business perspective, the most important objective is the availability and responsiveness of the service that IT provides to the business. Typically, IT responds to these business requirements by quantifying availability and performance: Availability: The percentage of the evaluation period when service was in an available state Performance: Usually represented by two SLIs such as responsiveness or speed and throughput or volume Additional SLOs may include accuracy (whether the service does what it is supposed to do), cost, security, number of incidents, time-to-repair, etc. SLOs must meet the following criteria before you can include them in SLAs: Attainable: The objective is worthless if IT will never be able to meet it. Measurable: The objective is worthless if it cannot be measured. Understandable: Reported statistics must relate to the user experience. Meaningful: The objective must be relevant to all parties. Controllable: Do not include objectives that cannot be controlled. Affordable: The objective may require additional funding that sponsors are not willing to provide. Additional budget allocation is a business-level decision. Mutually acceptable: One party cannot simple dictate the terms of the agreement. When developing an SLO, an IT organization needs to carefully select measurement metrics that are indicative of this SLO. For example, measuring availability from a user’s perspective is not a simple task. If an application is up and running, it does not mean that users can use it. If IT measures the availability of resources, it does not guarantee that this represents the actual user experience. There is no perfect solution to this problem. Nevertheless an IT organization must use SLIs that can be directly measured. SLAs must document each chosen SLI that will represent each of the SLOs and specify its data source.36 Service Level Management
  • 2.3.2 Negotiating on service level agreements SLOs set up the standards for measurements and determine requirements for monitoring tools. However, before they become a part of an SLA contract, an IT organization must settle with the business units on a mutual understanding of the SLOs and their targets. In the process of negotiating SLAs, an IT organization and its customers exchange information and seek reasonable service level targets. The business units must clearly communicate their requirements and explain the business impact if the proposed service is not acceptable. IT must clearly communicate their assessment of the attainable service levels, the proposed SLOs, and their limitations, as well as explain the costs associated with offering a higher level of service. When these negotiations are completed, IT must document the agreed upon SLOs and SLIs. Other components of the negotiated SLA may include: Term: Typically one to two years Scope: Business description, user locations, transaction volume, service hours Limitations: Transaction throughput, concurrent users, funding, etc. Remedies: Clearly defined penalties for non-performance; defined bonuses for delivering better than expected services Optional services: Current or future at additional cost Exclusions: Clear identification of what is excluded from this SLA Service variations: Different levels at different times, maintenance periods, etc. Reporting: Relevant, well understood list of all reports Administration: Description of ongoing effort and responsibilities Reviews: Validation of SLAs, SLM process, negotiate exceptions every six months Revisions: New SLAs possibly required for technology, workload, staffing, etc. Approvals: Assigned authority to approve changes and new SLAs Chapter 2. General approach for implementing service level management 37
  • 2.3.3 Implementing service level management tools When planing for the SLM implementation, an IT organization performs an analysis of the existing management tools while assessing its capability to provide the measurements as required by the proposed SLAs. Any gaps in management tools must be investigated and further addressed as part of the SLO development and SLA negotiation activities. Chapter 1, “Introduction to service level management” on page 3, introduces tools as one of four components of SLM. When implementing SLM, an IT organization must apply a strategy for the implementation of management tools based on goals for its SLM program, requirements for SLA measurements, IT culture and processes, and the overall benefits and cost of implementation. The effectiveness of the SLM management tools depends on how they are applied and how the right combination differs with each organization. Typically, an IT organization wants to reuse existing tools and add more tools as required. Simply having tools is not enough. They need to be applied correctly, which means they must be integrated into a solution. Typically, SLM uses a combination of traditional primary data collectors that capture data directly from the managed environment and secondary data collectors that extract data from primary data collectors. In addition, SLM needs data from monitoring tools that can simulate user experiences. Implementing service level management monitoring IT organization implements monitoring tools as required to manage the hardware and software components it operates: network management tools, performance management tools, incident management tools, etc. These management tools gather data for a range of purposes, one of which is SLM where focus is on monitoring the state and performance of IT services. We previously defined a service as a set IT resources used in enabling a business process. IT resources can be further grouped into a number of physical domains. Each physical domain is comprised of many subcomponent elements. The following list includes some of the major domains: Servers Network Storage Applications Transactions Databases Desktops38 Service Level Management
  • This simplistic view of IT domains does not account for the fact that each of thesedomains represents a number of different technologies integrated into complexconfigurations that can be managed by a variety of tools. However, when thesedomains are taken together, they control the quality of service. Therefore, it isnecessary to install products for monitoring each domain.From a functional perspective, SLM monitoring of the IT domains should includeevent monitoring, performance monitoring, usage monitoring, securitymonitoring, etc. In our illustration of a generic SLM implementation in thischapter, we do not address the specific monitoring tools. However, the followingchapters demonstrate an example of SLM implementation using IBM Tivoliproducts.The primary challenge before an IT organization, when it initiates the SLMprogram, is the question of which products to install and how to integrate theminto the most suitable SLM solution. After IT completes the planning and the SLAnegotiation phases, it usually has a clear understanding of the tools it needs toimplement to support SLAs. It has already decided to acquire missing tools.When additional products are required, installing, customizing, and integratingthe new products into the existing system management solution can be asignificant part of the SLM implementation effort.Since service can traverse multiple SLM domains, an IT organization must beable to view and evaluate the collected domain monitoring data for eachsupported service. In addition, SLM necessitates monitoring of user experiencesof the delivered service through use of transaction monitors that can generatetransactions and record their execution.Implementing business service management toolsWith the SLM focus on service specific monitoring, an IT organization is forced tochange its approach to organizing the data it collected from monitors. It must nowexpose the relationships of IT components to business process components andaggregate the monitoring data in a way that shows its impact on a company’sbusiness.Chapter 1, “Introduction to service level management” on page 3, introduces thebusiness service management (BSM) approach and the way to incorporate it intoSLM. BSM solutions are designed to improve the effectiveness of SLM through avariety of views, analytics, and automation.The implementation of BSM is a complex project that takes time and resources,but it simplifies and improves the ongoing management of IT events and servicelevels in the context of their impact on business. The topic of BSMimplementation and its role in improving SLM are covered in greater detail in theremaining chapters of this book. Chapter 2. General approach for implementing service level management 39
  • 2.3.4 Establishing a reporting function Service level reporting provides IT with a way to communicate the value and quality of its services. Reports are provided in formats that have been documented by SLAs and, therefore, are well understood by business managers. In addition to reporting service level performance, IT can use these reports to proactively address service difficulties. The reports must be simple and focus on the specific requirements of SLAs. This includes reporting achieved SLOs based on actual values of SLIs. The SLA should include a list of reports that IT intends to use for reporting on SLA compliance. For each report, the SLA should document the content, data sources, service level metrics, distribution, and frequency. In developing reports, an IT organization must categorize recipients based on their area of interest and responsibility. The requirements for each category may differ in perspective, presentation format, frequency, focus, and the granularity of information. IT should tailor reports to the recipient level and report only information that customers can understand. However, IT should also keep the supporting information and make it available when customers request to examine the data more closely. The three major categories of SLA report recipients are: Executive management Executives want to see how IT provides value to their business and how the quality of IT services affects business efficiency (including cost of degraded service in real dollars and lost opportunities). As a consequence, the executive reports must be highly summarized and outline the quality of IT service experienced internally by business units and externally by customers and business partners. In addition, executive management should understand the impact and cost of degraded services. These reports should use graphs and charts to communicate the overall assessment of the achieved service levels and relate their impact on business performance. Any experienced service difficulties should be explained with references to the support documentation as necessary. Business management Business units are interested in understanding how the quality of IT service helps them to achieve their business goals and the impact and cost of degraded service. The service level reports should relate the quality of IT delivered service to the volume of business transactions, staff productivity and customers satisfaction. It is not an easy undertaking. When reporting the40 Service Level Management
  • improved service levels, IT must relate this improvement to increase in business volumes, improved productivity, and better customer satisfaction. The same can be said about service outages and degradation. IT needs to demonstrate their impact on business performance and costs. IT management The service reports that IT distributes to business management should also be reviewed by all levels of IT management. This helps IT managers to understand how component failures and performance degradation affect service levels and impact business performance. In addition, IT management should receive the traditional technology reports that report the outages and performance degradation of resources as well as the response time and volume of application transactions. Using time as a correlation factor for both technology and service level reports, IT managers can gain knowledge regarding how the technology area that they manage affects the overall quality of IT delivered services. In addition to the SLA historical reporting (daily detailed reports, weekly summaries, monthly overviews, quarterly business summaries), an IT organization should implement the real-time alerting and proactive notification of customers and IT staff. It is important for real-time alerting of service outages and degradation to show the components that cause the impact, which business users are affected, and communicate business impact. As explained in Chapter 1, “Introduction to service level management” on page 3, BSM is well suited to perform this function.2.3.5 Adjusting IT processes to include service level management When planning for the SLM implementation, an IT organization must review its management processes and identify any adjustments needed to satisfy the requirements of its new mission. This provides an opportunity for IT to improve its responsiveness to business considerations as well as to improve its operation. Using the business knowledge it acquired during the SLM planning stage, IT can become more proactive in managing resources and establish priorities for its fault management process. As IT implements new monitoring and management tools, it needs to revise the operational procedures and documentation, staff new functions, and train operation personnel. In addition, IT should use the SLM rollout as an opportunity to improve the existing management practices in the following areas. Chapter 2. General approach for implementing service level management 41
  • Event management BSM provides facilities that allow consolidation of all enterprise events and provide a single point for event management based on business priorities. This increases the value and productivity of the IT operation and service desk personnel. It also prompts IT to establish a control center function that will be responsible for managing events. Important: There are some key benefits of well implemented event management processes. For example, IT management and business executives can evaluate the immediate business impact of IT events and understand how they affect SLA compliance. IT operations can prioritize fault management. Availability management SLM facilitates the transition from management of IT components to management of IT services and changes the metrics for measuring availability. When the underlying IT resources experience problems or become unavailable, the service may still perform satisfactory if resources are duplicated. The focus of BSM on service state management significantly improves the understanding of services. It offers more robust capabilities to determine service states based on rules governing the impact of events received by the underlying resources. Important: When managing availability, an IT organization must focus on identifying critical events for each service that by definition impact this service availability. IT operations can significantly improve the availability of IT services through the proactive management of critical events. Capacity management Monitoring the performance of IT physical domains, defined in 2.3.3, “Implementing service level management tools” on page 38, is a well established discipline in the majority of IT organizations. When implementing SLM, an IT organization requires additional aggregations of collected performance information to meet SLA obligations for reporting on the service level performance. Important: With BSM facilitating the mapping of resource-to-service relationships, an IT organization can improve its performance management processes by prioritizing the management of IT resources based on their business value. This approach also applies to proactively planning for additional capacity when service levels are in danger.42 Service Level Management
  • Change managementAn IT organization uses the change management process to evaluate the impactof requested changes and, therefore, to reduce risk of pending requests. BothSLM and BSM can significantly boost the effectiveness of any changemanagement process by supplying the criteria for risk evaluation, provided bySLAs, and facilitating impact visualization provided by BSM. Important: An IT organization must adjust its change management process to evaluate implications of the requested changes on agreed service levels and understand their business impact.Incident managementSome SLAs include SLOs for measuring service desk responsiveness and IThandling of faults. Service levels may include a time value for problemescalations and a mean-time-to repair value. Every IT organization has somevariation of an incident reporting system and escalation procedures.BSM improves event management and incident recording. It provides capabilitiesfor a proactive management of resources in need of repair. It often offers abidirectional interface to a number of help desk solutions. Business focus of SLMand BSM enables an IT organization to improve its incident managementprocess through timely recognition of faults, better understanding of their impact,and added value of SLA reporting. Important: When implementing SLM, IT needs to integrate its manual processes and the help desk solution it uses for incident management with SLAs and BSM.Cost managementSLM uses SLAs as a mechanism for governing use of IT resources to ensure thatIT services are performing according to the SLA specifications. Customersbecome aware of cost implications while negotiating SLAs.An IT organization must balance service cost with service delivery. As theservice provider, IT should use service pricing as the mechanism for accountingfor resource usage by business units. However, both resource accounting andservices charges become a contentious issue between IT and business units. Important: When implemented, both SLM and BSM should have input into the cost management process. This enables an IT organization to establish the regulation of resource use based on business value and improve communication with business units when applying charges for services. Chapter 2. General approach for implementing service level management 43
  • Application support Many enterprises have centralized all application development activities and infrastructure management activities under one IT organization. The scenarios in Part 2, “Case study scenarios” on page 195, use this model. IT development organizations typically develop and support such applications. Application support staff work for IT development management and interface with both business and IT support departments. For this reason, application support people can greatly contribute to SLA development, while greatly benefitting from the SLM and BSM implementation. Application support staff typically are well aware of the business process that IT is automating with its applications. The development organization often possesses the knowledge of service parameters such as the number of expected users, the expected response time, etc. In addition, the development organization may provide its own instrumentation to assist in managing performance of the applications that it implemented in support of business. However, application support staff often lacks the knowledge of IT infrastructure and rely on IT support and operation staff when researching user problems. Important: Application support people must be included in both the planning and implementation of the SLM and BSM programs. They should be involved in the design of service compositions for both SLM and BSM and should provide further input during their ongoing application support activities.2.4 Ongoing service level management program The SLM implementation program has supplied documentation, management tools, and SLOs to measure against. An IT organization has also completed review of its processes, identified the required adjustments, and established management reporting in support of SLAs. Now, the success of the SLM implementation hinges on the ongoing program of reporting, management, and improvements that aim to establish more trust between an IT organization and business units. SLAs provide a vehicle for communications and an instrument for management. IT must use both proactively in the ongoing effort to satisfy the SLM objectives through the following program of: Maintenance of service definitions SLA management via historical reporting Priority management of real-time faults44 Service Level Management
  • 2.4.1 Maintenance of service definitions As mentioned earlier, while planning for SLM, an IT organization must decompose business processes into IT services. Through interviews, IT obtains the required knowledge and uses it to define services by creating business views of IT resources. The SLM planning stage provides definitions of services and identifies the IT resource associations for each service. The initial business views of IT resources are created during the SLM implementation stage manually or automatically. Note: It is critical to accurately represent business use of IT resources in IT environments where the IT resource configurations and workloads change rapidly. An IT organization must address this issue through automatic discovery of dynamic changes in business-to-resource relationships based on policy rules. Business views are an important IT asset that must be protected and continuously updated. An IT organization must allocate resources to administer and continuously refine business views. This effort may vary depending on the SLM scope, tools, and the implementation strategy. Follow these few recommendations for ongoing management of business views of IT resources: Implement in phases. Begin simple and expand. Refine as necessary. Visualize the obtained knowledge of IT physical resources and their dependencies. Visualize the obtained knowledge of business process components. Construct business views by mapping business process components and IT resources. While defining a business view, consider only IT resources that are important for this business view. While defining a business view, always understand what it is for and who is going to use it. With the right tools, an IT organization can significantly improve the productivity of administering business views and their value for both IT and business units. BSM tools are designed to facilitate the creation and ongoing maintenance of business views as well as the rule-based dynamic mapping and management of relationships. Chapter 4, “Planning to implement service level management using Tivoli products” on page 109, addresses the use of business views in IBM Tivoli products in greater detail. Chapter 2. General approach for implementing service level management 45
  • The ongoing administration of business views includes the following activities: Adding new business views upon requests from the IT change management team Adjusting business views upon addition of new resources Deleting business views that are no longer needed Ongoing maintenance of business views2.4.2 Service level agreement management via historical reporting Manual processes for producing SLA reports are labor intensive, time consuming, and prone to error so most organization want to automate SLA reporting. They do this by using custom reporting applications, but these are expensive to build and maintain. The best solution is to use off-the-shelf tools that can be configured to gather the required information and produce SLA reports automatically. When negotiated, deploy SLAs for continuous monitoring and reporting. During the SLM implementation stage, an IT organization deploys monitoring tools that collect the negotiated measurement data from all IBM Tivoli Monitoring components that are covered by SLAs. When deployed, monitor and report on SLAs in a timely fashion. The SLA terms include the time and frequency of reporting (for example within five business days of the first of each month, the end of each month, etc). Reporting metrics include daily or hourly summaries depending on the collection cycle. SLA management relies on data deriving from multiple sources. This can either be collated via customized procedures (which are difficult and expensive to produce and maintain) or collected centrally with a mechanism such as the Tivoli Data Warehouse as discussed in Chapter 3, “IBM Tivoli products that assist in service level management” on page 53. The goal of the SLA management is to report the status of services and their compliance to SLA agreements. Frequency of reporting may vary with the organization and user perception of the current service. Here are a few examples of reporting requirements: Both business and IT executives may want to review their set of reports at least once a month. Business executives may want to be notified every time that the service level for their SLAs is breached. An IT director may want to be copied on all notifications to business executives and receive notifications of any trends toward violation within some future period (usually the next 24 or 48 hours).46 Service Level Management
  • Without automation, ongoing SLA management often fails to deliver the intended value despite of the well planned and well executed implementation. It is unacceptable for business executives when an IT organization takes several weeks to consolidate technical reports into a combined view of service.2.4.3 Priority management of real-time faults In the process of planning and implementing SLM, an IT organization defines services that it provides to automate business processes and documents the objectives for SLM in the SLAs contracts. According to the ITIL, SLM is the continuous process of measuring, reporting, and improving the quality of services but not specifically addressing the management part. You can assume that ITIL’s focus is on the traditional management cycle through historical reporting and reviews for managing SLAs that we addressed in 2.2.2, “Understanding the services” on page 29. Service definitions provide alignment of IT resources and business processes that they support, enabling management of IT resources based on their business value. The status of IT resources changes dynamically as they change state and receive normal and abnormal events. The ability of IT operations to handle the resolution of abnormal events (faults) hinges on the knowledge of their impact on business processes. Through understanding business value of IT resources, IT operations can manage real-time faults based on business priorities. SLM state management should consider several factors before deciding the final state of each service, such as state and priority of the service components, importance of events and number of occurrences, recovery from faults through resource pooling, scheduled outage due maintenance, components being repaired, and so on. An improvement in fault management by operations has a direct impact on service levels that are measured by the following SLIs: Service availability: Better definition of availability and more granular measurement improve quality of service levels. Component repair time: Faster recognition of problems and better understanding of their impact allow accelerated repairs and improved IT performance. Service desk responsiveness: Better understanding of faults, their priority, and impact allow better communication with users and improve their satisfaction. Chapter 2. General approach for implementing service level management 47
  • Cost of support: Better understanding of faults, their priority, and impact can significantly increase productivity of control center personnel and IT support staff. Fault management by business priorities also improves quality of IT operations, increases productivity of root cause analysis, and provides more visibility of IT value. Ongoing management for the effective priority management of real-time faults is not practical without BSM tools. The remaining chapters of this book provide detailed examples of priority management of real-time events by IBM Tivoli products.2.5 Continuous improvement A central theme for the service level manager is continuous improvement of the implemented SLM processes. The improvement process for SLM must reflect the fact that business and IT requirements change constantly, users expectations tend to rise over time, and quality improvement must be proactive rather than reactive.2.5.1 Improving quality of service levels The process of improving service levels begins by reviewing the deployment. It is followed by a continuous tuning effort and the periodic adjustment of SLAs to reflect business and IT changes. Deployment review session The planning and installation team must review the completeness and accuracy of service levels. The team must analyze the problems that impacted service levels but were not captured by tools. It must also adjust service definitions and measurement thresholds and investigate the need for additional monitors. Ongoing improvement through tuning An IT organization is likely to implement an ongoing effort to tune its definitions of services, measurement metrics, metrics data collection, automation policies, and performance of IT resources. In addition, IT can initiate a service level improvement program that is a more formal project to implement improvement actions derived from periodic reviews. The initial rollout of SLM often includes a few important but simple SLAs. This is followed by a continuous expansion of SLAs, which in turn results in new requirements for service definitions, measurement metrics, and monitoring tools.48 Service Level Management
  • IT management should work with business executives to immediately address any issues of user distrust of the reported service levels and use these issues as an opportunity for additional tuning. Periodic reviews of service levels Based on the ITIL definition, the ongoing service level improvement process includes periodic reviews of service achievements and maintenance of SLAs. The service level manager is responsible for facilitating this effort. Analyze the results of ongoing monitoring and reporting service levels and periodically review them with customers. This is the appropriate time to discuss the service achievements and trends, issues of service perception, as well as opportunities for improvement. Also review the existing SLAs periodically for service completeness and accuracy, as well as the relevance of targeted measurements and objectives.2.5.2 Improving efficiency of service level management SLM interacts with other IT processes while providing business-oriented service. For more information, see Chapter 1, “Introduction to service level management” on page 3. The efficiency of SLM is determined by the level of its integration with other IT processes (including tools and skills) and the maturity of its program. A natural maturation process of an IT organization that initiated SLM program involves the following stages: Evolution of monitoring (from component based to end-user experience based and then to service based) Management of service levels to reduce user impact of service degradations Proactive fault management based on business value Control service in an automated fashion to proactively detect and correct problems Proactive prediction of future business requirements and the associated resources that are e necessary to support business with the appropriate levels of service Integration of service management tools to enable IT users to decompose their business processes, automatically discover all supporting IT components, and review the quality of delivered service Chapter 2. General approach for implementing service level management 49
  • 2.5.3 Improving effectiveness of service level management For IT, taking a proactive approach is the best way to improve the effectiveness of its SLM program. An IT organization must recognize the fact that user expectations and business requirements will continue to increase over time. Another important factor for a proactive approach to SLM is that IT can sustain, rather than repair, service levels, so that: External customer revenue, cost-savings, customer satisfaction (corporate image) can be sustained. IT can be more efficient and plan problem fixes in a controlled and orderly fashion based on business needs rather than react to the next or what appears to be the biggest problem. Customers and internal clients are more loyal. SLA penalties are reduced. Proactive improvement of service level management process After SLAs are in place, the SLM process acquires the service levels to strive for. However, simply reacting to problems and reporting the achieved service levels is the wrong approach. Only proactive improvement can guarantee continuous achievement of service levels. SLM includes the proactive development of the right policies, procedures, organizational structures, and personnel skills to improve service level quality and to ensure that business processes are not affected by any service difficulties. Continuous improvement of the SLM process must focus on improving relationships with users while adding value to business processes through IT services. Every component of SLM must be examined regularly for improvement opportunity, and any improvement must be proactively communicated to users. It is the responsibility of the service level manager to ensure that corrective actions are proactively developed and executed for all identified improvements. The service level manager plays the central role in facilitating improvement for all aspects of SLM operation. Activities include improving understanding of business processes, improving and calibrating SLAs, driving improvements in technology and operations, and improving communications with users. Through a proactive approach to SLM, an IT organization can increase its credibility and receive more cooperation from business units. Proactive response to business changes Every service level manager must proactively seek information from users about pending changes in the existing business processes and communicate this information to IT management, so it can adjust IT resources as needed.50 Service Level Management
  • IT must investigate any deviations in the existing service levels. If it finds thatservice violations resulted from changes in business volumes or user behavior,IT must proactively communicate its findings to business units and renegotiateservice levels as necessary.IT must also integrate the rollout of new business applications with its changemanagement process and generate change requests for new service definitionsand SLOs before deploying these applications in production.Proactive management of service levelsChange is a constant factor in both business and IT environments. Maintaining ahigh quality of service requires a significant effort from any IT organization. Itmust anticipate the impact of changes while proactively improving itsmanagement of the existing SLAs, regulating resources, and managing userexpectations.Earlier this chapter addressed the service level improvement activities such asthe ongoing tuning, the periodic reviews, and the service improvement program.The focus of this proactive effort is to ensure the most effective management ofthe existing SLAs to meet and even exceed the negotiated service levels.Another aspect that contributes to the improvement of service levels involves theoptimization of services, regulation of resources, fault management,performance tuning, etc. When executed proactively, these operational activitiesallow IT to maximize resource use in support of SLAs and improve service levels.Improvement in service levels may lead to increased user expectations ofservice. A proactive approach to service level improvements allows an ITorganization to market its achievements in maximizing the service levels that canbe attained at current costs, and manage user expectations.Proactive integration of tools and processesSLM allows an IT organization to integrate a number of ITIL processes whileapplying business knowledge to managing IT infrastructure. Appendix A,“Service management and the ITIL” on page 447, describes servicemanagement in great detail. The ITIL processes and the tools to support themcontinue to evolve. Most companies still have significant integration issues withavailable commercial products while trying to use these products for SLM.IT must proactively research new technologies and enhance its practices basedon the experience of others. IT organization should always look for new solutionsthat provide better alignment between the IT organization and business units thatare more suitable for SLM. These solutions must provide more intelligentanalytics, a broader scope of data sources, and visualization of business and ITcomponents and their relationships. Chapter 2. General approach for implementing service level management 51
  • Most management solutions today typically require a significant customization. Integrating them with IT processes to provide SLM is a difficult and laborious effort. Chapter 1, “Introduction to service level management” on page 3, introduces a business-oriented approach for managing IT services or BSM and the value of its integration with SLM. A proactive approach of process and tools integration around a single set of service definitions can significantly improve the efficiency and the effectiveness of any SLM program. The remainder of this book demonstrates, via detailed examples and case studies, an SLM solution design that involves monitoring IT resources, monitoring of user experiences, event correlation as well as BSM automation, analytics, and reporting. Two test cases describe the integration of eight Tivoli products in support of two different SLM initiatives.52 Service Level Management
  • 3 Chapter 3. IBM Tivoli products that assist in service level management Chapter 2, “General approach for implementing service level management” on page 23, provides a generic approach to implementing service level management (SLM) processes. This chapter describes the key IBM Tivoli products used to implement them. It includes high level descriptions of the following products and how they integrate to provide an SLM solution: IBM Tivoli Business Systems Manager V3.1 IBM Tivoli Service Level Advisor V2.1 Tivoli Data Warehouse V1.2 IBM Tivoli Monitoring for Transaction Performance V5.3 IBM Tivoli Enterprise Console V3.9 IBM Tivoli Monitoring V5.2© Copyright IBM Corp. 2004. All rights reserved. 53
  • 3.1 IBM Tivoli product mapping Figure 3-1 shows a high-level representation of the IBM Tivoli products that can help to implement SLM. This chapter considers the two layers of components and describes the products that fit into each layer. The layers are: Monitoring and measurement metrics Service level management Service Level Management Real Time Management Predictive Management - IBM Tivoli Service Level Advisor - IBM Tivoli Business Systems Manager - Tivoli Data Warehouse Monitoring and Measurement Metrics Availability Performance Monitor Systems and Applications / User Experience Event Correlation and Automation - IBM Tivoli Monitoring for transaction Performance - IBM Tivoli Enterprise Console - IBM Tivoli Monitoring - IBM Tivoli Monitoring for Transaction Performance - IBM Tivoli Monitoring for Databases - IBM Tivoli NetView - IBM Tivoli Monitoring for Business Integration - IBM Tivoli Monitoring for Web Infrastructure Figure 3-1 Product mapping3.1.1 The monitoring and measurement layer The IBM Tivoli products in this layer monitor and measure the behavior of the IT infrastructure. They address two aspects of systems management: Availability management This includes products that monitor software and system resources to determine their availability. These products also provide functionality for event correlation across multiple platforms; assistance with determining the root cause of problems based on information gathered from multiple sources; automatic correction of problems; and automatic notification of support personnel.54 Service Level Management
  • The IBM products directly relevant to SLM are: – IBM Tivoli NetView® Family – IBM Tivoli Enterprise™ Console – IBM Tivoli Monitoring for Transaction Performance Performance management This includes products that measure the internal performance of systems and applications. They also provide information about the experience of end- users. The functionality includes continuous monitoring and recording of information, raising alerts when thresholds are exceeded, and gauging user experience by making response time measurements and running synthetic transactions. These products can monitor hardware databases and applications. The IBM products directly relevant to SLM are: – IBM Tivoli Monitoring for Transaction Performance – IBM Tivoli Monitoring – IBM Tivoli Monitoring for Database – IBM Tivoli Monitoring for Business Integration – IBM Tivoli Monitoring for Web Infrastructure3.1.2 The service level management layer This layer contains components to enable organizations to closely align IT with business goals, meet service level commitments, ensure peak business service performance, and reduce support and licensing costs. They also help customers to focus limited resources on the most important areas of the business. The products in this layer address two aspects of systems management: Real-time management This includes products to evaluate the health of business functions in near-real time to alert operational personnel of service failures or degradation. The relevant product in this group is IBM Tivoli Business Systems Manager. Predictive management This includes products to collect performance and availability metrics and compare them with service level objectives (SLO). The relevant products are: – IBM Tivoli Service Level Advisor – Tivoli Data Warehouse Chapter 3. IBM Tivoli products that assist in service level management 55
  • 3.2 IBM Tivoli Business Systems Manager IBM Tivoli Business Systems Manager is part of the IBM’s business service management (BSM) portfolio of products that provides intelligent management software to enable businesses to optimize their operational agility. For more information about IBM Tivoli Business Systems Manager, refer to IBM Tivoli Business Systems Manager Getting Started Guide, SC32-90883.2.1 Business goals Typical business goals addressed by IBM Tivoli Business Systems Manager are: Aligning IT operations with business priorities to maximize business value Optimizing IT resources to help manage costs Maximizing efficiency to drive productivity and revenue Optimizing service availability to achieve enhanced customer satisfaction3.2.2 High level description and main functions IBM Tivoli Business Systems Manager is a near real-time, event-driven systems management product. It can manage and monitor systems, applications, middleware and other related systems management components in a business context. Traditional systems management tools focus on technology and deliver only fragmented views of the health of the enterprise infrastructure. IBM Tivoli Business Systems Manager works in conjunction with IBM and third-party systems management tools to analyze the impact of faults and outages on business services. IBM Tivoli Business Systems Manager provides your operations technicians with a view of IT infrastructure components as they relate to your overall business. It also provides your executives with a high level view of the status of critical services in your organization. Main functions The main functions of IBM Tivoli Business Systems Manager are: Console consolidation IBM Tivoli Business Systems Manager provides a consolidated view of systems management information derived from a wide range of existing IT management solutions and IT platforms. In doing so, it enables you to maintain the value of existing tools while reducing complexity. For a full list of supported platforms and systems management tools, see IBM Tivoli Business Systems Manager Getting Started Guide, SC32-9088. This list includes:56 Service Level Management
  • – Distributed systems products • IBM Tivoli Enterprise Console® 3.7.1 or later • IBM Tivoli NetView Version 7.1 or later • IBM Tivoli Monitoring Version 5.1 or later • IBM Tivoli Monitoring for Database, Application, Business Integration, Web Infrastructure, and Collaboration • IBM Tivoli Monitoring for Transaction Performance Version 5.1 or later • BMC Patrol Version 3.4 • Computer Associates Unicenter TNG Versions 2.1, 8 2.2, and 2.4 • NetIQ AppManager Server Version 4.02 • Hewlett-Packard Openview Network Node Manager for Solaris and HP/UX– z/OS products • IBM Tivoli System Automation for z/OS Version 2.3 • IBM Tivoli NetView for z/OS Version 5.1 • IBM Tivoli Workload Scheduler for z/OS Version 8.1 or later • IBM Tivoli OMEGAMON® products • Various third-party schedulers and other systems management products from BMC, Computer Associates and Allen Systems GroupMonitoring from a business services perspectiveIBM Tivoli Business Systems Manager provides monitoring capability for acomplex combination of system resources across multiple platforms. As aresult, it provides views that reflect the business services being providedacross the enterprise.Executive awareness of service statusBy providing executive dashboards that reflect the status of businessservices, IBM Tivoli Business Systems Manager provides executives in yourorganization with a clear and simple view of the status of their key businessservices.Impact analysis and critical path managementIBM Tivoli Business Systems Manager provides views that clearly show theimpact of faults in the infrastructure on business services. In doing so, itfacilitates prioritization of fault resolution effort based on business impact. Italso helps with the identification of single points of failure.Root cause analysisThe various views and reports available in IBM Tivoli Business SystemsManager can be used to assist the process of root cause analysis. TheBusiness Impact view shows resources that are affected by a fault and theirrelation to the resource with the fault. Also the Event View displays the eventsthat triggered the resource state change. Chapter 3. IBM Tivoli products that assist in service level management 57
  • Reporting IBM Tivoli Business Systems Manager provides standard reports out of the box. It also provides a process to export systems management data to the Tivoli Data Warehouse for analysis. Basing service level agreements (SLAs) on business services The close coupling of IBM Tivoli Business Systems Manager with Tivoli Data Warehouse and IBM Tivoli Service Level Advisor enables construction of SLAs based on the availability of business systems using out-of-the-box interfaces. Visibility of SLA breaches and trends The Tivoli Data Warehouse and IBM Tivoli Service Level Advisor interfaces also enables SLA breaches and trends to be made visible in executive dashboard views. Resource discovery IBM Tivoli Business Systems Manager includes several tools to assist in discovery of resources present in an enterprise to reduce implementation time and costs. See “Resource discovery” on page 61.3.2.3 Benefits of using IBM Tivoli Business Systems Manager Table 3-1 summarizes the advantages and business benefits of using the key features of Tivoli Business Systems Manager.Table 3-1 Benefits and advantages of Tivoli Business Systems Manager features Features Advantages Benefits Provides business context for Allows IT staff to view IT resources in Provides a business context IT, enables greater the context of critical business for IT; enables greater accountability to business user services and prioritize actions based accountability to business needs, and improves ability to on business impact and make user needs; improves ability prioritize and optimize intelligent trade-offs to prioritize and optimize Shows the relationship between Allows IT staff to make intelligent Increases availability applications trade-offs, to easily spot inefficiencies (uptime) of critical business and problems, and to quickly systems diagnose the root cause of complex failure scenarios Automatically discovers and Allows for the placement of Speeds implementation time; builds graphical views of discovered resources into containers reduces errors; ensures applications that represent critical business currency and accuracy of systems and applications management view58 Service Level Management
  • Features Advantages BenefitsDynamically adjusts the Automatically keeps the business Reduces errors and improvesbusiness system view for system view up-to-date by avoiding productivitycomponents added, modified, the problem of manual entry leadingor deleted to obsolete information displays3.2.4 Key concepts in IBM Tivoli Business Systems Manager To understand Tivoli Business Systems Manager, you must be familiar with the following concepts: Business systems Business system views Work spaces Resource discovery Event processing and propagation Business systems Imagine a Web-based insurance application. The infrastructure for the service may consist of a set of applications running on UNIX and Microsoft® Windows® 2000 servers. Some may be outside the company intranet and others behind firewalls, legacy mainframe database systems, miscellaneous load balancers and other network devices, and diverse other components. Together they deliver the service that customers know as Online Insurance. A IBM Tivoli Business Systems Manager business system is a logical container or folder that is populated with resources representing IT components. In this example, IBM Tivoli Business Systems Manager represents Online Insurance as a business system that contains icons that represent the resources that deliver the service. Business systems can be created manually from the console, automatically by giving IBM Tivoli Business Systems Manager a set of rules, or via Extensible Markup Language (XML) files. For full details, see Chapter 4, “Planning to implement service level management using Tivoli products” on page 109. There are three aspects of a business system: Resources: The group of resources that provide the business function Relationships: The hierarchical relationship between the resources Propagation rules: The method of dealing with events that affect the resources Chapter 3. IBM Tivoli products that assist in service level management 59
  • Business systems may be built for different purposes, for example: Service based: A business system that contains a set of applications and other resources that support a service such as internet banking Department based: A business system that contains all resources supporting the accounting department Technology based: A business system that contains all UNIX servers in the enterprise Geographically based: A business system that contains all applications for the Europe, Middle East, Africa (EMEA) region Business system views IBM Tivoli Business Systems Manager displays business systems in business system views. These are used to monitor the availability of resources and the service as a whole. They also helps to visualize the hierarchical relationships between the components. There are several types of business system views for different purposes. They represent the information about business systems in different ways. Tree view: Displays resources in a tree format Hyperview: Displays resources in an navigable elliptical view with a selected resource as the launch point You can use this view to quickly navigate complex business systems using the mouse. Table view: Displays resources in a table and provides sorting and filtering options Topology view: Displays representations of the relationships between resources IBM Tivoli Business Systems Manager can provide users with views appropriate to their responsibilities. It is a simple matter to configure one view for a specific user, such as the manager of the Web services group, and a different one for a group of users, such as the internet banking support team.60 Service Level Management
  • Work spacesThe IBM Tivoli Business Systems Manager systems administrator can designdifferent work spaces for users. The workspace setup determines what individualusers will see when they log on.The systems administrator must design work spaces carefully to reflect the rolesof the people using them. They must also focus the attention of support staff onthe most important business services. A help desk may need a work space thatincludes a business system view based on the physical organization of systemsand applications. But a CIO may want a work space that shows all the businessprocesses in the enterprise, at a lower level of detail than the help desk.Resource discoveryBefore IBM Tivoli Business Systems Manager can monitor a resource, it must beaware of its existence, understand what type of resource it is, and know where itbelongs in the enterprise. Even a medium-sized enterprise contains too manyresources to record manually, so IBM Tivoli Business Systems Manager providesseveral mechanisms for discovering resources: Bulk discovery: This runs as a batch job on z/OS systems. It also sends information about discovered resources to the IBM Tivoli Business Systems Manager database where Load/Discover scheduled jobs are run to complete the processing. A similar bulk discovery process is provided for Tivoli Workload Scheduler for z/OS, and for distributed systems resources instrumented with monitors. They communicate through the IBM Tivoli Business Systems Manager common listener interface, including IBM Tivoli NetView and CA Unicenter TNG. Rediscovery: This is similar to bulk discovery, except that resources already in the database are ignored. It is essentially a delta discovery. Auto discovery: When enabled, this process automatically discovers certain types of resources, including DB2®, IMS™, and CICSPlex® resources. Similar script-driven processes are available to drive delta discoveries for resources instrumented though the common listener interface and the set of IBM Tivoli Monitoring products. Discovery by event: This process discovers resources that were not previously identified from messages and exceptions sent to IBM Tivoli Business Systems Manager. If an event is received for an unknown resource, the discovery process creates the resource and posts the event to it. Chapter 3. IBM Tivoli products that assist in service level management 61
  • Event processing and propagation Chapter 4, “Planning to implement service level management using Tivoli products” on page 109, describes how IBM Tivoli Business Systems Manager processes events in detail. Events are sent to IBM Tivoli Business Systems Manager from both z/OS and distributed systems environments: z/OS events are forwarded to IBM Tivoli Business Systems Manager via the Source/390 address space on the z/OS machines. Distributed systems events are passed to IBM Tivoli Business Systems Manager via the Tivoli Enterprise Console or common listener interface. When an event is forwarded to IBM Tivoli Business Systems Manager, it is associated with the resource representing the object in the real-world that gave rise to it, for example a CICS® transaction. The resource is included in one or more business systems that form a hierarchy of folders representing services. The IBM Tivoli Business Systems Manager propagation engine then examines the priority of the event and compares it with the tolerance rates set for the resource. If the tolerance rate is exceeded, the propagation engine takes escalation action by sending a further event (called a child event) to the parent objects in the hierarchy. This process continues iteratively until all escalation steps are considered. This process is called event propagation. It is the key component of IBM Tivoli Business Systems Manager’s ability to assess the business impact of events.3.2.5 IBM Tivoli Business Systems Manager architecture Figure 3-2 shows a simplified architecture diagram for Tivoli Business Systems Manager. For more information, see IBM Tivoli Business Systems Manager Getting Started Guide, SC32-9088.62 Service Level Management
  • zOS Tivoli Data Tivoli NetView Warehouse Source/390 for zOS TBSM Servers Host Integration Event Handler History Server Server Server Web Console Propagation Console Web Console Server Database Server Server Server Console Agent Common Listener Health Monitor Listener Service Server Health Monitor Client Tivoli Management Region Distributed Data TEC Task Server Source. Event Enablement ( Netview, ITM)Figure 3-2 Tivoli Business Systems Manager flowchart IBM Tivoli Business Systems Manager servers IBM Tivoli Business Systems Manager is implemented on a set of Intel® servers running Windows 2003 or Windows 2000. The exact number of physical servers required depends on the size and type of enterprise being managed. IBM Tivoli Business Systems Manager Installation and Configuration Guide, SC32-9089, provides guidance on hardware and software prerequisites and physical placement of the following logical servers: Database server: This is based on the Microsoft SQL Server and hosts the IBM Tivoli Business Systems Manager data repository. History server: Actions and events from IBM Tivoli Business Systems Manager are regularly archived to this server for reporting and auditing purposes. Using a separate server for reporting improves the performance of the main database server and speeds up production of reports. Chapter 3. IBM Tivoli products that assist in service level management 63
  • Console server: This supports IBM Tivoli Business Systems Manager Clients using the Java™ console. Propagation server: This performs impact analysis on events received by IBM Tivoli Business Systems Manager to determine what business systems are affected. Events are propagated to higher level business system objects in accordance with the business system hierarchy and propagation rules. Event handler server: This processes events coming to IBM Tivoli Business Systems Manager from z/OS environments if these are being managed. Host integration server: This is required if IBM Tivoli Business Systems Manager is to process events from z/OS machines that do not have TCP/IP communications protocol installed. It handles Systems Network Architecture (SNA)-based communications used on legacy systems. In practice, most client implementations of Tivoli Business Systems Manager do not require this service. Web Console application server: This supports clients accessing IBM Tivoli Business Systems Manager with a Web browser-based console. The Web console provides many of the views available to users of the Java console and is suitable for many types of users. Health monitor server: This monitors the health and availability of the other IBM Tivoli Business Systems Manager servers and their related components.3.3 IBM Tivoli Data Warehouse Tivoli Data Warehouse provides a central repository in which you can store data about your IT infrastructure, including network devices and connections, desktops, hardware, software, events, and other information. Stored data is subsequently analyzed and used to produce reports about the behavior of IT components and services. Important: Tivoli Data Warehouse is not an independent product. It is delivered free with all Tivoli Data Warehouse-enabled applications. All enabled Tivoli source applications are shipped with the necessary Tivoli Data Warehouse components to import their data into the central data warehouse. For more information about Tivoli Data Warehouse, refer to Introduction to Tivoli Data Warehouse, SG24-6607.64 Service Level Management
  • 3.3.1 Business goals Typical business goals addressed by Tivoli Data Warehouse are to: Provide a cost-effective means of storing systems management information Provide a basis for analyzing the IT infrastructure to achieve the best business value Provide a basis for SLA reporting3.3.2 High level description and main functions Using Tivoli Data Warehouse, you can store, in one place, data about your IT infrastructure, including network devices and connections, desktops, hardware, software, events, and other information. Depending on the data stored, you can analyze your IT costs, performance, and other trends across your enterprise. You can also show the value and return on investment (ROI) of Tivoli and IBM software. And you can use it to identify areas where you can be more effective. Moving data from operational data stores into a data warehouse keeps them running efficiently while preserving historical data for analysis over longer periods of time. Tivoli Data Warehouse comes with database optimizations for the efficient storage of large amounts of historical data and fast access to data for analysis and report generation, and the infrastructure and tools necessary for maintaining the data in the warehouse. Tools include the Tivoli Data Warehouse application, IBM DB2 Universal Database™ Enterprise Edition, IBM DB2 Data Warehouse Center, and IBM DB2 Warehouse Manager. Tivoli Data Warehouse uses an open architecture to store, aggregate, and correlate historical data. This enables you to include data from your own applications and third-party systems management products as well as data from IBM Tivoli products. If your enterprise supports multiple customers, you can keep the data in a single data warehouse, but restrict access rights so that customers can see and work with only their own data and reports. You can also restrict access rights at the level of an individual. Crystal Enterprise Professional V.9 is included for production of reports. You can also analyze your data using any product that performs online analytical processing (OLAP), planning, trending, analysis, accounting, or data mining. The user interfaces are available only in English, French, German and Japanese. However reports can be translated into other languages as listed in Installing and Configuring Tivoli Data Warehouse version 1.2, GC32-0744-02. Chapter 3. IBM Tivoli products that assist in service level management 65
  • Main functions There are four main functions within Tivoli Data Warehouse. Importing data from source applications: This involves running a source Extract-Transform-Load (ETL) program, commonly referred to as an ETL1, to move operational data from the source location into the central data warehouse. Data is condensed as this is done. Preparing data for use in reporting: This involves running a target ETL program, commonly known as an ETL2, to prepare data and move it into a data mart ready for use by the target reporting application. Design and production of reports: Apart from producing simple reports, this is done using the functionality of the reporting or business intelligence tools rather than the Tivoli Data Warehouse itself. Housekeeping: Various housekeeping jobs are run to maintain the database and archive old data at a predetermined point. Many IBM Tivoli products are delivered with warehouse enablement packs (WEPs), which provide the ETLs needed for the previously listed processes. The concepts of ETLs and data marts are explained further in 3.2.4, “Key concepts in IBM Tivoli Business Systems Manager” on page 59.3.3.3 Benefits of using Tivoli Data Warehouse Table 3-2 summarizes the advantages and business benefits of using the key features of Tivoli Data Warehouse.Table 3-2 Benefits and advantages of Tivoli Data Warehouse features Features Advantages Benefits Central repository for systems Can correlate and analyze data Added value through management data from various monitors in one cross-platform, business oriented place reports based on an end-to-end view of the enterprise Data consolidation Reduced data storage costs and Cost savings and data consistency easier data management; a for reporting purposes common data model Open, proven, and out-of-the No need to develop data Cost savings through reduced box interfaces for many IBM extraction programs interface development and testing Tivoli products costs Being built on a relational Data warehouse can handle The warehouse can grow with the database management system data for large enterprises organization (RDBMS) architecture provides a high degree of scalability66 Service Level Management
  • Features Advantages BenefitsAbility to use many analysis Provides the ability to use the Flexibility and standardizationand reporting tools reporting tool of choice for the organizationOut-of-the-box reports for IBM Standard reports delivered with Reduced cost of designing andTivoli applications IBM Tivoli applications may be producing standard reports sufficient for many purposesIntegration with IBM Tivoli Out-of-the-box interface enables Rapid development of SLAsService Level Advisor rapid development of SLAs based on data in the warehouseBuilt-in security Ability to segregate data for Ability to use one data warehouse different customers using for multiple customers to reduce out-of-the-box functionality costs and maintenance3.3.4 Key concepts in Tivoli Data Warehouse To understand Tivoli Data Warehouse, you need to be familiar with the concepts of ETL programs and data marts. ETL programs ETL programs process data in three steps. 1. Extract: Data is extracted from the data source. 2. Transform: Data is validated, transformed, aggregated, and cleansed so that it fits the required format. 3. Load: The processed data is loaded into the target database. In Tivoli Data Warehouse, there are two types of ETLs whose operation is shown in the diagram in Figure 3-3. Central warehouse ETL: Otherwise known as a source ETL or ETL1, this ETL extracts the data from the source applications and loads it into the central data warehouse. Data mart ETL: Otherwise known as target ETL or ETL2, this ETL loads data into data marts and is discussed in the next section. Chapter 3. IBM Tivoli products that assist in service level management 67
  • Service Level Advisor SLA Data Marts 2 Central Data ETL Data Source ETL1 Warehouse (schema) ETL 2 Data Marts Data Marts Reporting Data Marts Web-based Reports Figure 3-3 Tivoli Data Warehouse ETLs Data marts Although it is possible to run a query against the entire central data warehouse, this is inefficient because of the large volume and range of data that builds up over time. Instead, data is prepared in advance for use in target applications, such as Crystal Reports, and placed in a data mart. A data mart is a subset of the historical data that satisfies the needs of a specific department, team, or customer. It is optimized for interactive reporting and data analysis. The format of a data mart is specific to the reporting or analysis tool you plan to use. Each application that provides a data mart ETL creates its data marts in the appropriate format. The data mart ETL extracts a subset of historical data from the central data warehouse that contains data tailored to and optimized for a specific reporting or analysis task. The data mart ETL is also known as target ETL or ETL2.3.3.5 Tivoli Data Warehouse architecture Figure 3-4 shows the high level architecture of the Tivoli Data Warehouse in diagram form. Although Tivoli Data Warehouse can be implemented on the z/OS platform, most implementations are on distributed systems platforms. Only these are discussed in this redbook. For further information about the various possible configurations, see Implementing Tivoli Data Warehouse V 1.2, SG24-7100.68 Service Level Management
  • Win NT/2000 Web-based Reports Cr TDW 1.2 ys Control Center ta le Po IE 5.5 SP2 & 6.0 rtf Netscape 6.2.3 o lio WM Agent Applications’ DB2 UDB EE & IBM HTTP Server Data Store DB2/390 IIS v4 & v5 iPlanet Lotus Domino ETL1 Central Data Data Mart Web Server Warehouse ETL2 Data Mart Data Mart Data Mart Star Schema Crystal AIX,Sun Solaris, HP-UX, Data Mart Enterprise NT/2K, OS/390, Turbo, Server RedHat and SuSE Linux AIX,Sun Solaris, NT/2K, MVS Win NT/2000/2003Figure 3-4 Reporting with Tivoli Data Warehouse Tivoli Data Warehouse is implemented on a set of Intel or UNIX servers. The exact number of physical servers required depends on the size and type of the enterprise that is being managed. Tivoli Data Warehouse Release Notes Version 1.2, SC32-1399, provides guidance about hardware and software prerequisites, as well as the physical placement of the logical servers. Figure 3-4 gives an overview of the Tivoli Data Warehouse 1.2 architecture and supported software components. The architecture can be comprised of the following elements: Tivoli Data Warehouse Control Center Server One or more central data warehouse databases One or more data mart databases IBM DB2 warehouse agents and agents sites Crystal Enterprise server The following sections explain each of these elements in detail. Chapter 3. IBM Tivoli products that assist in service level management 69
  • Tivoli Data Warehouse Control Center Server The control center server is the system that contains the control database for Tivoli Data Warehouse. It is the system from which you manage your data. The control database contains metadata for both Tivoli Data Warehouse and for the warehouse management functions of IBM DB2 Universal Database Enterprise Edition. There can only be one control server in a Tivoli Data Warehouse 1.2 deployment. Source databases A source databases holds operational data to be loaded into the Tivoli Data Warehouse environment. Typically, the source databases are application specific and their number is likely to increase for a Data Warehouse installation. Most Tivoli products provide a WEP, which makes application-specific data available in a source database. This can be a dedicated warehouse source database since it is coming with IBM Tivoli Monitoring. Or it can be an interface to the application’s built in database as provided for IBM Tivoli Storage Manager or IBM Tivoli NetView. A WEP for Tivoli products also includes the means to upload data from the source database to the central data warehouse, minimizing the efforts for data collection. Central data warehouse The central data warehouse is a set of IBM DB2 databases that contains the historical data for your enterprise. You can have up to four central data warehouse databases in a Tivoli Data Warehouse 1.2 deployment. Data marts A separate set of IBM DB2 databases contains the data marts for your enterprise. Each data mart contains a subset of the historical data from the central data warehouse that satisfies the analysis and reporting needs of a specific department, team, customer, or application. You can have up to four data mart databases in a Tivoli Data Warehouse 1.2 deployment. Each data mart database can contain the data for multiple central data warehouse databases. A WEP for a Tivoli application provides all necessary means to fill data marts with their specific data.70 Service Level Management
  • Warehouse agents and agent sitesThe warehouse agent is the component of IBM DB2 Warehouse Manager thatmanages the flow of data between data sources and targets that are on differentcomputers. By default, the control center server uses a local warehouse agent tomanage the data flow between operational data sources, central data warehousedatabases, and data mart databases. You can optionally install the warehouseagent component of IBM DB2 Warehouse Manager on a computer other than thecontrol center server.Typically, you place an agent on the computer that is the target of a data transfer.That computer becomes a remote agent site, which the Data Warehouse Centeruses to manage the transfer of Tivoli Data Warehouse data. This can speed upthe data transfer and reduce the workload on the control server.Crystal Enterprise ServerCrystal Enterprise Professional for Tivoli replaces completely the ReportsInterface of Tivoli Enterprise Data Warehouse (TEDW) 1.1. It gives a newmechanism for obtaining the reports provided by the WEPs. The installation andconfiguration of a Crystal Enterprise environment is mandatory before you begininstalling Tivoli Data Warehouse 1.2. Tivoli Data Warehouse 1.2 supports onlythe full stand-alone installation of Crystal Enterprise. In the full stand-aloneinstallation, Crystal Enterprise is installed on a single computer that is alreadyrunning as a Web server.Crystal Enterprise depends on a number of software components that must beup and running prior to its installation. Operating systems – Windows NT® – Windows 2000 – Windows 2003 Internet browser – Internet Explorer – Netscape Navigator Web servers – IBM HTTP Server – Microsoft IIS – iPlanet Enterprise Server – Lotus® Domino® Chapter 3. IBM Tivoli products that assist in service level management 71
  • 3.4 IBM Tivoli Service Level Advisor IBM Tivoli Service Level Advisor provides SLM capabilities for enterprise organizations that need to measure, manage, and report on availability and performance aspects of their internal IT infrastructure. The SLM capabilities of IBM Tivoli Service Level Advisor complement the performance and availability measurement functions of other Tivoli products, such as IBM Tivoli Monitoring for Transaction Performance and IBM Tivoli Business Systems Manager. For more information about IBM Tivoli Service Level Advisor, refer to Introducing IBM Tivoli Service Level Advisor, SG24-6611. This section provides a basic overview of the product, its components, and functions as needed to understand and implement Business Service Management.3.4.1 Business goals Typical business goals addressed by IBM Tivoli Service Level Advisor are: Provision of SLAs that are meaningful to businesses Automation of SLA report production to reduce costs and provide timely report delivery Provision of a mechanism to resolve disagreements on SLA achievement Provision of early warning of trends toward SLAs being breached3.4.2 High level description and main functions Tivoli Enterprise Monitoring and Business System monitoring tools usually store their availability and performance data in their own databases. This data is then moved into the Tivoli Data Warehouse using ETLs as explained in 3.3.4, “Key concepts in Tivoli Data Warehouse” on page 67. After all the source ETLs have written the latest data into the central data warehouse, the IBM Tivoli Service Level Advisor ETL moves a subset of this data into the SLM measurement data mart. Here it can be processed and analyzed against defined SLOs. For example, an SLA can be based on response-time measurements against a Web application. IBM Tivoli Monitoring for Transaction Performance measures the response time of the Web site, breaking the service into associated sub-applications that complete a service transaction. Data is moved to the Tivoli Data Warehouse database, from where IBM Tivoli Service Level Advisor can extract and analyze it using its built in data-collector interface. It can then determine long-term trends. It can also generate reports showing violations, or trends toward violations, of guaranteed levels of service.72 Service Level Management
  • IBM Tivoli Service Level Advisor helps IT service delivery organizations to increase the business value of their delivered service by providing the ability to understand and measure service level attainment within their organization. This service level understanding helps to: Maintain productivity and customer satisfaction Verify end user service levels Analyze historical data to predict future service levels Manage costs, and improve planning by assuring offered services Measure, manage, and report on availability and performance Automate SLM based on SLOs Evaluate service delivery based on business schedules Provide Web-based customer reports IBM Tivoli Service Level Advisor depends on the collected performance and availability data from a variety of monitoring and performance tools to deliver SLA reports and SLA trends identification. Figure 3-5 illustrates the flow of data. ITSLA Environment Source Applications Environment SLM Source So urc Server Appl 1 e ET L1 n ETL Source Sourc tratio e ETL 2 Regis Appl 2 SLM TDW ITSLA Reports Central Database Server Warehouse Pr o ces s ET L N L ET ITSLA ce SLM ur Database So Task Source ITSLA Server Appl N Measurement Drivers Data MartFigure 3-5 Data flow in the IBM Tivoli Service Level Advisor Service level management life cycle with IBM Tivoli Service Level Advisor SLM is an ongoing process. Both the service provider and customer must adjust the SLOs to achieve the best service level with reasonable costs and efforts regularly. Chapter 3. IBM Tivoli products that assist in service level management 73
  • IBM Tivoli Service Level Advisor supports the full life cycle of the SLM process: 1. Creating the SLA 2. Monitoring and reporting the Service Level 3. Delivery and reviewing of SLA reports 4. Ongoing refinement of SLA agreements IBM Tivoli Service Level Advisor offers easy-to-use interfaces, quick and easy customization of features, and default values where appropriate. It is delivered with several additional IBM applications that support the functionality: IBM DB2 Universal Database (DB2 UDB) Enterprise Edition: This database is used to store measurement data. IBM Tivoli Service Level Advisor warehouse enablement packs (also known as warehouse packs): This includes ETL routines both for collecting data from the central data warehouse and writing data back into the central data warehouse for use by other applications. IBM WebSphere® Application Server: This is used by IBM Tivoli Service Level Advisor as the operating environment for the administrative user interface and the reporting interface.3.4.3 Benefits of using IBM Tivoli Service Level Advisor Table 3-3 emphasizes the features of the IBM Tivoli Service Level Advisor, while focusing on the advantages and benefits associated with them.Table 3-3 The IBM Tivoli Service Level Advisor summary Features Advantages Benefits Automated SLA Eliminates the process of manually Improves IT resource productivity, evaluation reviewing and correlating and reduces education and training component-level reports against costs required to support component customer SLAs SLM products IBM patent-pending Identifies IT service delivery problems Maintains customer productivity and trend analysis before they occur, allowing you to take satisfaction with the services they action to maintain service levels rather depend on to meet business than simply report them objectives Manage service level Leverages existing systems Provides business-level definition and business management applications, and management of IT infrastructure and schedules across associates service delivery with increases ROI of existing systems existing IT infrastructure business operations management tools Flexible, Web-based Identifies problem areas, providing Helps communicate the business reporting executive summary, and detailed value of IT resources and can justify operations status of SLAs cost expenditures74 Service Level Management
  • Features Advantages BenefitsTivoli Data Warehouse Provides open, extensible aggregation Leverages business intelligence point for all systems management data tools for data mining, and provides (including non-Tivoli data), and an open interface to include cross-domain reporting additional monitoring data in SLAs3.4.4 Key concepts in IBM Tivoli Service Level Advisor To understand IBM Tivoli Service Level Advisor, you need to be familiar with the concepts of offerings, realms, and customers. For a full explanation of these concepts, see Creating SLAs with IBM Tivoli Service Level Advisor 2.1, SC32-1247. Offerings An offering is a template used to describe a service, with agreed service levels, that forms the basis for SLAs in which it is ultimately included. Offerings can be differentiated to provide service level choices to customers, such as Gold, Silver, and Bronze services, or any other naming convention that suggests a unique level of service. An offering is associated with a business schedule that is defined with one or more schedule periods. Each schedule period is associated with a unique schedule state, such as peak, prime, standard, off hours, and others. Each of these states can be configured to represent a unique level of service for that schedule period. As a result, you can offer a wide range of service levels in your offering, while also providing for scheduled outages for maintenance or other downtime activities. Realms and customers IBM Tivoli Service Level Advisor provides mechanisms called realms and customers to segregate data to ensure that reporting information is made available only to the appropriate people. Realms The highest level of segregation is called a realm. A realm contains one or more customers. For example, you may create a realm for all customers in the United States and another realm for customers in Europe. You might also create a realm for customers in a particular line of business within your organization or another grouping that makes sense for your enterprise. Customers can be associated with more than one realm. Chapter 3. IBM Tivoli products that assist in service level management 75
  • Customers The second level of segregation is called a customer. A customer must be associated with at least one realm. When SLAs are defined in IBM Tivoli Service Level Advisor, they are associated with both realms and customers. When IBM Tivoli Service Level Advisor users are given access to reporting functionality, they are given permission to access specific realms and customers. They are unable to view data related to realms or customers for which they have not been granted permissions.3.4.5 IBM Tivoli Service Level Advisor architecture Figure 3-6 shows the high level architecture of the IBM Tivoli Service Level Advisor. The components are described in the following paragraphs. We recommend that you install the components of IBM Tivoli Service Level Advisor inside a firewall if possible. Figure 3-6 IBM Tivoli Service Level Advisor architecture76 Service Level Management
  • The SLM serverThe SLM server performs the main functions necessary for SLM, including: Processing SLAs Scheduling and performing evaluation and trend analysis of measurement data Storing the results of the analysis Notifying of violations or trends toward violations of SLAsSLM reportsThe report servlets use the functions of the IBM WebSphere Application Serverto obtain SLA results data and generate summary reports in the form of tablesand graphs that can be displayed in a Web browser. The enterprise can usethese servlets to create customized Web pages for customers, displaying resultsof evaluation and trend analyses, such as: Actual level of service provided Number of SLA violations Trends toward future violationsSLM administration serverThe SLM administration server provides a Web-based interface in a WebSphereenvironment for: Creating offerings and SLAs Specifying schedules and defining peak times and other schedule states (such as standard, prime, off hours, and others) for varying levels of service Specifying how often evaluation and trend analysis should be performed Specifying breach values for metrics associated with offerings Managing active SLAsIBM Tivoli Service Level Advisor databasesIBM Tivoli Service Level Advisor depends on three main databases for itsoperation: The central data warehouse database from Tivoli Data Warehouse The SLM database The SLM measurement data mart Chapter 3. IBM Tivoli products that assist in service level management 77
  • The central data warehouse database The central data warehouse database component of Tivoli Data Warehouse serves as the main repository for historical data that is used by applications such as IBM Tivoli Service Level Advisor. Tivoli Data Warehouse is the source for resource related data. It is also where the various Tivoli performance and availability monitoring applications send their data for long-term storage. The SLM database The SLM database serves several purposes: Stores information from Tivoli Data Warehouse that defines possible combinations of resources and metrics that are available to the customer to be used in SLAs Stores information specific to the definition and management of schedules, offerings, customers, realms, and SLAs. Stores the results of the analysis and trend evaluation processes, when SLOs are compared to expected results From this information, the customer can view summarized reports that indicate how well services are being delivered. The SLM measurement data mart The SLM measurement data mart is the database that contains a subset of the measurement data from Tivoli Data Warehouse that is of interest to IBM Tivoli Service Level Advisor in the evaluation and reporting of SLA conformance. It is updated on a regular basis with the latest metric data from Tivoli Data Warehouse.3.5 IBM Tivoli Monitoring for Transaction Performance IBM Tivoli Monitoring for Transaction Performance is a centrally managed suite of software components. These components monitor the availability and performance of Web-based services and Microsoft Windows applications. For more information of IBM Tivoli Monitoring for Transaction Performance, refer to IBM Tivoli Monitoring for Transaction Performance Administrator’s Guide Version 5.3, GC32-9189. This section provides a basic overview of the product, its components, and functions as needed to understand and implement BSM.78 Service Level Management
  • 3.5.1 Business goals IBM Tivoli Monitoring for Transaction Performance typically addresses these business goals: Improving customer satisfaction by being aware of the client user experience and resolving issues quickly Improving the analysis of faults in applications to enable more rapid repairs Providing measurements based on application response times and availability to use in SLAs3.5.2 High level description and main functions IBM Tivoli Monitoring for Transaction Performance captures detailed performance data for all of your on demand business transactions. You can use this software to perform the following on demand business management tasks: Monitor transactions: You can monitor every step of an actual customer transaction as it passes through the complex array of hosts, systems, and applications: – Web and proxy servers – Web application servers – Database management systems – Legacy back-office systems and applications Simulate customer transactions: While mimicking the behavior of real users performing standard tasks, you can collect performance data that helps you assess the health of your on demand business components and configurations under different conditions and at different times. Reporting: You can produce comprehensive real-time reports that display recently collected data in a variety of formats and from a variety of perspectives. By integrating with Tivoli Data Warehouse, you can store collected data for use in historical analysis and long-term planning. Notification of performance issues: You can receive prompt automated notification of performance problems either directly through a console or by integration with IBM Tivoli Enterprise Console and IBM Tivoli Business Systems Manager. Root cause analysis: You can quickly isolate the source of performance problems as they occur, so that you can correct those problems before they produce expensive outages and lost revenue. Chapter 3. IBM Tivoli products that assist in service level management 79
  • 3.5.3 Benefits of using IBM Tivoli Monitoring for TransactionPerformance Table 3-4 summarizes the main advantages and business benefits of using the key features of IBM Tivoli Monitoring for Transaction Performance.Table 3-4 Benefits of IBM Tivoli Monitoring for Transaction Performance features Features Advantages Benefits Robotic synthetic Provides a view of the experience of real Enables early identification and transactions application users resolution of service shortcomings Transaction Goes beyond the “black box” view of an Faster identification and decomposition application to understand the component resolution of problems with causing service issues; support staff needs to application availability and know less about the application architecture to performance identify root causes IBM Tivoli Enterprise Enables events to be forwarded to the IBM Console consolidation means Console integration Tivoli Enterprise Console and acted on by there is less chance of missing operators service issues IBM Tivoli Business Enables the business impact of events to be Ensures focus on the most Systems Manager assessed and to enable escalation important issues based on the integration business impact of a fault Tivoli Data Enables long-term storage of performance Reduced data storage costs and Warehouse and availability data and supports the use of the creation of meaningful SLAs integration data in SLAs created with IBM Tivoli Service Level Advisor3.5.4 Key concepts in IBM Tivoli Monitoring for TransactionPerformance To understand IBM Tivoli Monitoring for Transaction Performance, you must be familiar with the concepts of Application Response Measurement (ARM), record and playback, and Java 2 Platform, Enterprise Edition (J2EE), monitoring. For a full explanation about these concepts, see IBM Tivoli Monitoring for Transaction Performance Administrator’s Guide, GC32-9189. Application Response Measurement The ARM application programming interface (API) is the key technology used by IBM Tivoli Monitoring for Transaction Performance to capture transaction performance information. The ARM standard describes a common method for integrating enterprise applications as manageable entities. It allows users to extend their enterprise management tools directly to applications, creating a80 Service Level Management
  • comprehensive end-to-end management capability that includes measuringapplication availability, application performance, application usage, andend-to-end transaction response time. The ARM API defines a small set offunctions that can be used to instrument an application to identify the start andstop of important transactions.IBM Tivoli Monitoring for Transaction Performance provides an ARM engine tocollect the data from ARM instrumented applications. This is a multithreadedapplication implemented as the tapmagent that exchanges data though an IPCchannel, using the libarm library, with ARM instrumented applications. Data iscollected and aggregated to generate useful information. It is correlated withother transactions, and then thresholds are checked against policies. Data isforwarded to the management server and placed into the database for reportingpurposes.IBM Tivoli Monitoring for Transaction Performance Version 5.3 also provides ageneric ARM component for more transaction monitoring coverage. The genericARM capability enables you to monitor custom ARM-instrumented applications. Note: ARM instrumentation does not support a 63Cbit Java Virtual Machine (JVM).The ARM engine notifies the IBM Tivoli Monitoring for Transaction PerformanceManagement Server of transaction violations, new edge transactions appearing,and edge transaction status changes.The following paragraphs provide an overview of the transaction correlationprovided by IBM Tivoli Monitoring for Transaction Performance. For additionalinformation, including instrumenting applications using ARM, see the IBM TivoliMonitoring for Transaction Performance Administrator’s Guide Version 5.3,GC32-9189.ARM correlation is the method by which parent transactions are mapped to theirrespective child transactions across multiple processes and multiple servers.Each IBM Tivoli Monitoring for Transaction Performance component isautomatically ARM-instrumented and generates a correlator. The initialroot/parent or edge transaction is the only transaction that does not have aparent correlator. From there, IBM Tivoli Monitoring for Transaction Performancecan automatically connect parent correlators with child correlators to trace thepath of a distributed transaction through the infrastructure. It provides themechanisms to easily visualize this through the topology views. Chapter 3. IBM Tivoli products that assist in service level management 81
  • IBM Tivoli Monitoring for Transaction Performance implements the following ARM correlation mechanisms: Parent-based aggregation: This process collects transaction performance data on the parent of a subtransaction and displays transaction performance relative to its path. This provides the ability to monitor the connection points between transactions. It also monitors path-based transaction performance across farms of servers providing the same function. Policy-based correlators: A portion of the correlator is used to pass a unique policy identifier within the correlator. The associated policy controls the amount of data collected and the thresholds associated with that data. Instance and aggregated performance statistics: This provides both additional metrics and a complete and exact trace of the path taken by a specific transaction. Parent performance initiated trace: The trace flag within the ARM correlator is used by the agent in the trace field for transactions that are performing outside of their threshold. This provides for the dynamic collection of instance data across all systems where this transaction executes. Sibling transaction ordering: This is the ability to determine the order of execution of a set of child transactions relative to each other. Aggregated correlation: IBM Tivoli Monitoring for Transaction Performance carries out aggregated correlation. This provides a summary of a transaction over a period of time rather than a record for each and every instance of a transaction. Record and playback Record and playback records Web transactions and Microsoft Windows applications, which you can play back to assess transaction performance and availability. Performance data helps determine if a transaction is performing as expected and exposes problem areas of your Web and application environment. IBM Tivoli Monitoring for Transaction Performance provides two playback components. Each is paired with an application that records transactions. Synthetic Transaction Investigator (STI) Recorder and STI: The STI Recorder records a sequence of steps for a Web transaction, such as searching for information or purchasing an item from an online supplier. An STI playback policy instructs the STI component to play back the recorded transaction and collect performance data. Rational® Robot and Generic Windows: The Rational Robot, which is provided with IBM Tivoli Monitoring for Transaction Performance but installed as a separate application, records actions in a Microsoft Windows application.82 Service Level Management
  • The Generic Windows component plays back a Rational Robot recording to provide timing measurements. J2EE instrumentation IBM Tivoli Monitoring for Transaction Performance provides enhanced J2EE instrumentation capabilities. The collection of ARM data generated by J2EE applications is invoked from the management server and is controlled by user-configured policies. The monitoring policy is then distributed to the management agent. The transactions to monitor are specified using edge definitions, for example, the first URI invoked when using the application. It is possible to define the level of monitoring for each edge. To monitor a J2EE application server, the computer must be running the IBM Tivoli Monitoring for Transaction Performance Agent. A single IBM Tivoli Monitoring for Transaction Performance agent can monitor multiple J2EE application servers on the management agent’s host. IBM Tivoli Monitoring for Transaction Performance J2EE monitoring uses Java byte-code insertion (BCI).3.5.5 IBM Tivoli Monitoring for Transaction Performance architecture The IBM Tivoli Monitoring for Transaction Performance management server is a J2EE application deployed onto the WebSphere Application Server platform. A high level view of the architecture is provided in Figure 3-7. IBM Tivoli Monitoring for Transaction Performance has the following physical components: Management server: This server provides the services and user interface needed for centralized management. Management agent: These agents are installed on computers across the environment to run discovery operations and collect performance data for monitored transactions. Store and forward management agent: This component enables transfer of data across firewalls. ARM engine: This component handles internal systems management data passed from business applications that have been ARM instrumented. The following sections explain each of these components further. Chapter 3. IBM Tivoli products that assist in service level management 83
  • Figure 3-7 IBM Tivoli Monitoring for Transaction Performance architecture The management server The management server is the control center for the IBM Tivoli Monitoring for Transaction Performance installation. It is shared by all IBM Tivoli Monitoring for Transaction Performance components. The management server collects information from and provides services to deployed management agents. Deployed as a standard IBM WebSphere Application Server Enterprise Archive (EAR) file, the management server provides the following functions: User interface: This interface is accessed via a browser and has many uses including: – Creating and scheduling policies to instruct monitoring components to collect performance data – Establishing acceptable performance metrics or thresholds, defining notifications for threshold violations and recoveries – Viewing reports and system events – Managing schedules84 Service Level Management
  • Real-time reports: This interface is also accessed by a browser and provides graphical displays of performance data collected by the monitoring and playback components. There are reports to help you assess the performance and availability of your Web sites and Microsoft Windows applications. Event generation: Application events are generated when performance thresholds are exceeded; system events are generated for system errors and notifications. Events can be viewed and event severities configured to decide what action will to be taken when they are generated. The management server can send e-mail notification to specified recipients, run a specified script, or forward selected event types to the IBM Tivoli Enterprise Console or as Simple Network Management Protocol (SNMP) traps. Storage of policies and data: The management server controls a set of databases that store policy information, events, and performance data collected by management agents. Communication with management agents: The management server uses Web services and the Secure Sockets Layer (SSL) to communicate with the management agents. ARM data is uploaded to the management server from management agents at regularly scheduled intervals (the upload interval). By default, the upload interval is once per hour.The management agentManagement agents, based on Java Management Extensions (JMX), areinstalled on computers across your environment. Management agents providethe following functions: Discovery: This enables automatic identification of incoming Web transactions that may need to be monitored. Listening and playback monitoring: A management agent can have listening and playback components installed that run policies at scheduled times. The management agent sends any events generated during a listening or playback operation to the management server, where event information is made available in event views and reports. ARM engine for data collection: A management agent uses the ARM API to collect performance data. Each of the listening and playback components is instrumented to retrieve the data using ARM standards. Policy implementation: When a discovery, listening, or playback policy is created, an agent group is assigned to run the policy. You define agent groups to include one or more management agents that are equipped to run the same policy. For example, if you want to monitor the performance of a consumer banking application that runs on several WebSphere application servers, each of which is associated with a management agent and a J2EE monitoring component, you can create an agent group named All J2EE Chapter 3. IBM Tivoli products that assist in service level management 85
  • Servers. All of the management agents in the group can run a J2EE listening policy that you create to monitor the banking application. Threshold checking: When performance thresholds in listening or playback policies are exceeded, the management agent sends events to the management server. Events can be set for transactions, and in many cases, for the subtransactions within a transaction. This is one step in an overall transaction. Store and forward management agent Store and forward can be implemented on one or more management agents (typically only one) to handle firewall situations. Important: Store and forward cannot work with proxies. In general, you need one store and forward management agent for each firewall that has to be traversed. Store and forward management agents perform these firewall-related tasks: Enabling point-to-point connections between management agents and the management server Enabling management agents to interact with store and forward as though store and forward were a management server Routing requests and responses to the correct target Supporting SSL communications Supporting one-way communications through firewall The ARM engine When you install and configure a management agent, the ARM engine is automatically installed as part of the management agent. The ARM engine and ARM API comply with the ARM 2.0 and 4.0 specifications. The ARM specification was developed to meet the challenge of tracking performance through complex, distributed computing networks. ARM provides a way for business applications to pass information about the subtransactions they initiate in response to service requests that flow across a network. This information can be used to calculate response times, identify subtransactions, and provide additional data to help you determine the cause of performance problems. The Generic ARM component (new in Version 5.3 of IBM Tivoli Monitoring for Transaction Performance) enables you to monitor the performance of any ARM 2.0- or 4.0-instrumented application. You can monitor both ARM-instrumented86 Service Level Management
  • products from independent software vendors (ISV) or custom in-house applications. The Generic ARM component can also detect and monitor custom metrics that are recorded from these ARM instrumented applications. All transaction data collected by the Quality of Service, J2EE, STI, and Generic Windows monitoring components of IBM Tivoli Monitoring for Transaction Performance is collected by ARM.3.6 IBM Tivoli Enterprise Console IBM Tivoli Enterprise Console provides a focal point for events coming from monitoring products installed in a distributed systems environment. It is usually associated with implementation of Tivoli Framework products but can also handle event information sent using the SNMP. For more information about IBM Tivoli Enterprise Console, refer to IBM Tivoli Enterprise Console User’s Guide 3.9, SC32-1235.3.6.1 Business goals IBM Tivoli Enterprise Console typically addresses these business goals: Increasing efficiency of operations staff by providing a single event console Reducing operational costs by automating fixes to common problems Providing an effective and automated incident escalation solution3.6.2 High level description and main functions The IBM Tivoli Enterprise Console product is a rule-based event management application. It integrates system, network, database, and application management to help ensure the optimal availability of the IT resources in an enterprise. The main functions of the IBM Tivoli Enterprise Console are: To provide a centralized, global view of your computing enterprise To collect, process, and automatically respond to common management events, such as a database server that is not responding, a lost network connection, or a successfully completed batch processing job To act as a central collection point for alarms and events from a variety of sources, including those from other Tivoli software applications, Tivoli partner applications, custom applications, network management platforms, and relational database systems Chapter 3. IBM Tivoli products that assist in service level management 87
  • To forward appropriate events to the IBM Tivoli Business Systems Manager to enable it to determine the business impact of faults The Tivoli Enterprise Console product helps you effectively process the high volume of events in an IT environment by: Prioritizing events by their level of importance Filtering redundant or low-priority events Correlating events with other events from different sources Determining who should view and process specific events Initiating automatic corrective actions, when appropriate, such as escalation notification, and opening trouble tickets Identifying hosts and automatically grouping events from the hosts that are in maintenance mode in a predefined event group3.6.3 Benefits of using IBM Tivoli Enterprise Console Table 3-5 summarizes the main advantages and business benefits of using the key features of IBM Tivoli Enterprise Console.Table 3-5 Benefits of IBM Tivoli Enterprise Console features Features Advantages Benefits Event filtering Events requiring no further action are not Operators can focus on the displayed on the console significant events Event correlation Operators focus on the cause of faults rather More rapid fault resolution than the symptoms Automatic Significant faults that are not noticed or not yet Improvement in service availability escalation worked on are escalated automatically IBM Tivoli Business Enables the business impact of events to be Ensures focus on the most Systems Manager assessed and escalated important issues based on the Integration business impact of a fault Tivoli Data Enables long-term storage of performance Reduced data storage costs and Warehouse and availability data and supports the use of the creation of meaningful SLAs integration data in SLAs created with IBM Tivoli Service Level Advisor88 Service Level Management
  • 3.6.4 Key concepts of event groups in IBM Tivoli Enterprise Console To understand IBM Tivoli Enterprise Console, you need to be familiar with the concepts of event groups. This section introduces you to event groups. However, you can find a detailed explanation in IBM Tivoli Enterprise Console Installation Guide Version 3.9, SC32-1233. An event group is a configured logical area of responsibility that is used to notify users that an event matching a specified set of criteria has occurred. An administrator configures event groups using the Java version of the event console. For example, if your network contains a group of computers that are used for critical work, you may want to create an event group that receives events for these critical computers. This logical grouping of events is an event group. To define an event group, you must specify the selection criteria for the events in the group. This data constitutes an event group filter. An event group filter can include any event attribute except for extended or customer-defined attributes. Table 3-6 lists some of the more commonly used attributes for event group filtering. Table 3-6 Attributes for event group filtering Attribute name Description Event class Specifies the class of the event, as assigned by the event source that forwards the event Origin Identifies the protocol address or host name of a host from which you want to receive events Severity Specifies the severity of the event from Unknown, through Harmless to Fatal Source Specifies the type of application that created the event Status The status of the event, which could have various states including Open, Closed, and Acknowledged Chapter 3. IBM Tivoli products that assist in service level management 89
  • 3.6.5 IBM Tivoli Enterprise Console architecture A high level view of the architecture of IBM Tivoli Enterprise Console is provided in Figure 3-8. The key components are described in the sections that follow. Figure 3-8 IBM Tivoli Enterprise Console architecture The IBM Tivoli Enterprise Console event server The event server is at the heart of the IBM Tivoli Enterprise Console. It provides a centralized location for the management of events in a distributed environment. The event server processes input from event consoles and updates the event database. Event consoles read data from the event database and see the latest status of events as they are updated. The event server evaluates events against a set of rules to determine if it should automatically perform any predefined tasks90 Service Level Management
  • or modify the event. If human intervention is required, the event server notifiesthe appropriate operator. The operator performs the required tasks and thennotifies the event server when the condition that caused the event is resolved.Incoming are events given a unique number and time stamped as they areentered into the event database. They are then evaluated by the rule engine. Ifthe rule engine is busy, events are buffered and evaluated later. Rules includeaction to be taken when an event meets the specified rule conditions. This helpsto reduce the amount of interpretation and responses required by operators. Forexample, a particular event may be known to trigger one or more instances ofanother event. In such a case, a rule can be used to automatically downgrade theseverity of the event or close events that are known to be caused by thetriggering event.The event server can use rules to delay responses to an event. This may be useto deal with self-correcting faults to prevent an operator from needlesslyresponding to a problem that will shortly go away. Rules can be used, forexample, to attempt to restart a router and give an operator a low-severity notice.If the attempts to restart the router within a designated time period fail, a rule canspecify that attempts to retry be cancelled and that a higher-severity notice besent to an operator. If an operator does not respond to an event after a specifiedperiod of time, the event server can take additional actions including sending ane-mail, paging the operator, or sending an e-mail notice to an alternate contact.You can use the predefined rules that the Tivoli Enterprise Console productprovides, or you can create your own. For full information about the predefinedrules, see IBM Tivoli Enterprise Console Rule Set Reference Version 3.9,SC32-1282. You can find information about creating your own rules in IBM TivoliEnterprise Console Rule Developer’s Guide Version 3.9, SC32-1234.A rule can specify the following actions among others: Correlating events Responding automatically to events, such as running an application or script Delaying responses to events Escalating events Modifying event attributes Modifying attributes of other events Preventing duplicate events from being displayed Dispatching Tivoli or other administrative actions on resources Reevaluating a set of events Discarding an event Generating a new event Forwarding an event to another event server Chapter 3. IBM Tivoli products that assist in service level management 91
  • IBM Tivoli Enterprise Console Event database The Tivoli Enterprise Console product uses an external RDBMS to store the large amount of event data that is received. The RDBMS Interface Module (RIM) component of the Tivoli Management Framework is used to access the event database. IBM Tivoli Enterprise Console user interface server The user interface (UI) server provides communication services between the event consoles and the event server. It automatically updates the event database when, for example, an operator acknowledges an event. The UI server also provides a set of commands that enable an operator to change any event attribute, list the events in a specific event group, and display a message on the operator’s desktop. IBM Tivoli Enterprise Console Event console An event console provides the graphical user interface (GUI) used by operators to view and respond to events. IBM Tivoli Enterprise Console product provides two versions of the event console, a Java version and a Web version. Certain tasks require the Java console, but either version can be used to manage events. The event console provides a window for monitoring events based on event groups. An event group is a set of events that meets certain filter criteria. The Java event console Key features of the Java event console include: Tivoli secure logon for added security Event information directly retrieved by each event console from the database for high performance and scalability Configurable refresh rate Ability to run third-party or custom scripts and applications from the event console Ability to run predefined tasks Ability to configure automated tasks to run when a particular event is received by the event console Ability to view more help information about an event in a Web page Automatic resolution of conflicts, for example, should two operators simultaneously attempt to change the status of an event92 Service Level Management
  • Support of multiple views: – Configuration view to configure the event consoles – Summary chart view to show a high-level overview of the health of resources represented by an event group – Priority view showing event groups are represented by buttons with the status indicated by colorThe Web event consoleThis is used to manage events from your Web browser and provides many of thefunctions available in the Java console. The Web version of the event consoleorganizes the tasks that you can perform in a portfolio, which is titled My Work.IBM Tivoli Enterprise Console event adapterAn event adapter is a process that typically resides on the same host as amanaged source and monitors the source for events.For example, if you want to monitor the Windows event log, you would install theWindows event log adapter on the host. When an event adapter receivesinformation from its source, the adapter formats the information and forwards it tothe event server for interpretation and response.You can configure an event adapter to discard selected events instead offorwarding them all to the event server to reduce network traffic and event serverworkload.Tivoli Event Integration FacilityThe Tivoli Event Integration Facility is a toolkit that expands the types of eventsand system information that you can monitor. You can use it to develop your ownadapters that are tailored to your network environment and your specific needs.Tivoli Enterprise Console gatewayThe Tivoli Enterprise Console gateway receives events from TME® andnon-TME adapters and forwards them to an event server. The Tivoli EnterpriseConsole gateway provides the following benefits: Greater scalability, which allows you to manage sources with less software running on the endpoints Improved performance of the event server Simple deployment of adapters and updates Event correlation and filtering closer to the sources decreasing the amount of network traffic Chapter 3. IBM Tivoli products that assist in service level management 93
  • Adapter Configuration Facility The Adapter Configuration Facility provides a GUI to configure and distribute TME adapters. You can use the Adapter Configuration Facility to create profiles for adapters and set adapter configuration and distribution options. Tivoli NetView IBM Tivoli NetView provides the network management function for the IBM Tivoli Enterprise Console product. It monitors the status of network devices and automatically filters and forwards network-related events to IBM Tivoli Enterprise Console.3.7 IBM Tivoli Monitoring IBM Tivoli Monitoring provides automated monitoring of essential IT system resources. For more information about IBM Tivoli Monitoring, refer to IBM Tivoli Monitoring User’s Guide version 5.1.2, SH19-4569-03.3.7.1 Business goals Typical business goals addressed by IBM Tivoli Monitoring are: Provision of high quality services Proactive monitoring of services Making the best value of the IT infrastructure3.7.2 High level description and main functions IBM Tivoli Monitoring applies pre-configured best practices to the automated monitoring of essential IT system resources. The application detects bottlenecks and other potential problems, provides for the automatic recovery from critical situations, and eliminates the need for system administrators to scan manually through extensive performance data. IBM Tivoli Monitoring integrates seamlessly with other Tivoli availability solutions, including IBM Tivoli Business Systems Manager and IBM Tivoli Enterprise Console. It was previously called Tivoli Distributed Monitoring (Advanced Edition). Most features of IBM Tivoli Monitoring can be used as supplied, or modified using the GUI or command line interface (CLI) provided. The main features of Tivoli Monitoring are: An off-the-shelf solution for monitoring Windows, UNIX, Linux®, and OS/400® systems, with data collection and problem analysis performed locally94 Service Level Management
  • Ready-to-use resource models that report on specific aspects of a system’s status For example, the Process resource model provides information about the status of processes, CPU usage, and so forth. The ability to add resource models to a Tivoli profile, which can be distributed to multiple systems simultaneously The ability to modify resource models by changing, for example, threshold levels to match specific requirements The ability to view both real-time and historical data for any system from a centralized monitoring application, called the Web Health Console, which is supplied with the product The ability to send the results of data collection and analysis to the IBM Tivoli Enterprise Console or to the IBM Tivoli Business Systems Manager The ability to specify automatic corrective or preventive actions to resolve situations that could develop into real problems The ability to schedule monitoring to take place at user-specified times A heartbeat function that regularly checks the availability and status of attached endpoints and makes the information available to the IBM Tivoli Enterprise Consoleserver, IBM Tivoli Business Systems Manager, or Tivoli Monitoring Notice Group3.7.3 Benefits of using IBM Tivoli Monitoring Table 3-7 summarizes the main advantages and business benefits of using the key features of IBM Tivoli Monitoring.Table 3-7 Benefits and advantages of IBM Tivoli Monitoring features Features Advantages Benefits Out-of-the-box Little or no configuration required to start Rapid ROI resource models monitoring on implementation Heartbeat function Rapid and automatic notification of More responsive fault resolution resources that cannot be contacted leading to increased customer satisfaction Web Health Console Ability to view real-time and historical data Better informed problem analysis for a resource IBM Tivoli Enterprise Enables events to be forwarded to IBM Console consolidation means less Console Integration Tivoli Enterprise Console chance of missing service issues Chapter 3. IBM Tivoli products that assist in service level management 95
  • Features Advantages Benefits IBM Tivoli Business Enables the business impact of events to be Ensures focus on the most Systems Manager assessed and to enable escalation important issues based on the Integration business impact of a fault Tivoli Data Enables long-term storage of performance Reduced data storage costs and Warehouse and availability data and supports the use the creation of meaningful SLAs Integration of data in SLAs created with IBM Tivoli Service Level Advisor3.7.4 Key concepts in IBM Tivoli Monitoring To understand IBM Tivoli Monitoring, you need to be familiar with the concepts presented in the following sections. Resource models In IBM Tivoli Monitoring terminology, a resource model is defined as “the logical modeling of one or more resources, along with the logic on which cyclical data collection, data analysis, and monitoring are based.” In practical terms, a resource model is a pre-built set of rules for monitoring a resource using IBM Tivoli Monitoring that is installed, for example on a server that may take corrective action or send an event if an exception condition is detected. IBM Tivoli Monitoring provides a range of out-of-the box, predefined resource models to specify which resource data is accessed from the system at runtime and how this data is processed. For example, the Process resource model obtains data related to processes running on the system. Performance data is automatically collected by the resource model and processed by an appropriate algorithm to determine whether the system is performing to your expectations. Generally, you can use the resource model default values and still obtain useful data. However, if necessary, you can customize the resource models to suit your requirements or even build your own resource models using the IBM Tivoli Resource Model Builder. For details about the resource models supplied with the product, see IBM Tivoli Monitoring Version 5.1.2 Resource Model Reference Guide, SH19-4570-03. For guidance about creating resource models, see IBM Tivoli Resource Model Builder Version 1.1.3 User’s Guide, SC32-1391-02. Cycles and thresholds Resource models run on a cyclical basis. A resource model installed at an endpoint gathers data at regular intervals, known as cycles. The duration of a cycle is the cycle time. A resource model with a cycle time of 60 seconds gathers96 Service Level Management
  • information every 60 seconds. The data collected is a snapshot of the status ofthe resources specified in the resource model. Each of the supplied resourcemodels has a default cycle time, which you can modify.Each resource model defines one or more thresholds. A threshold is a namedproperty of the resource with a default value that you can modify in thecustomization phase. Typically, the value specified for a threshold represents asignificant reference level of a performance-related entity. If the level is exceededor not reached, the operator or system administrator should be notified.IndicationsEach resource model generates an indication if certain conditions implied by theresource model’s thresholds are not satisfied in a given cycle. Each resourcemodel has its own algorithm to determine which combinations of thresholdsshould generate an indication.Indications may be generated in any one of the following circumstances: A single threshold is exceeded: For example, in the Windows Process resource model, the Process High CPU indication is generated when the High CPU Usage threshold is exceeded. A combination of two or more thresholds are exceeded: For example, in the Windows Logical Disk resource model, a High Read Bytes per Second indication is generated when both the following thresholds are exceeded: – The amount of bytes transferred per second (being written or read) exceeds the High Bytes per Second threshold. – The percentage of time that the selected disk drive spends for read or write requests exceeds the High Percent Usage threshold.Occurrences and holesIBM Tivoli Monitoring resource models do not look only for conditions that exceedthresholds once. They can also look for a pattern of repeats over time. Anoccurrence is the term used to refer to a cycle during which an indication occursfor a given resource model. A hole is the term used to refer to a cycle duringwhich an indication does not occur for a given resource model.Resource models can compare a series of measurements with a given pattern ofoccurrences and holes to determine whether further action is needed. Thisapproach provides much greater flexibility and avoids precipitate raising ofevents. This is explained in great detail with examples in IBM Tivoli MonitoringVersion 5.1.2 Resource Model Reference Guide, SH19-4570-03. Chapter 3. IBM Tivoli products that assist in service level management 97
  • The heartbeat function In addition to the monitoring processes described earlier, IBM Tivoli Monitoring operates a heartbeat function. This function monitors the basic system status at endpoints attached to the gateway at which it is enabled. In essence, this function checks regularly to determine whether resources can be reached in the network. If not, events may be sent to IBM Tivoli Enterprise Console, IBM Tivoli Business Systems Manager, and the IBM Tivoli Monitoring Notice Group.3.7.5 IBM Tivoli Monitoring architecture Figure 3-9 shows a high level view of the architecture of IBM Tivoli Monitoring. The key components are described in the sections that follow. Figure 3-9 IBM Tivoli Monitoring components98 Service Level Management
  • The IBM Tivoli Monitoring Base componentInstall this component on the Tivoli management region server and on allgateways with endpoints that you want to monitor. It provides a GUI and a CLIthat are available at both the server and gateway. You can control all functions ofthe product from either node. And you can configure the component to operatethe heartbeat function for all endpoints directly attached to the system on which itis installed.IBM Tivoli Monitoring Web Health ConsoleThe Web Health Console is the Web-based graphical interface for TivoliMonitoring. It allows you to view real-time information about a specific problemand check the status (or health) of a set of endpoints. You can use the WebHealth Console to work with real-time data or with historical data that waspreviously logged to a local database.IBM Tivoli Monitoring Endpoint componentThe endpoint component, which requires a Tivoli management agent, performsthe resource management through one or more resource models that aredistributed to the endpoint with a Tivoli Monitoring profile. The endpointcomponent is installed automatically when a Tivoli Monitoring profile isdistributed to the endpoint for the first time.The IBM Tivoli Monitoring TBSM AdapterThis component feeds discovery information and IBM Tivoli Monitoring events tothe IBM Tivoli Business Systems Manager.The Gathering Historical Data componentThis component enables IBM Tivoli Monitoring to use Tivoli Decision Support forServer Performance Prediction (Advanced Edition). It uses data collected byspecific IBM Tivoli Monitoring resource models to populate a database on theTivoli server where it is installed. The collected data is aggregated every 24hours and added to the IBM Tivoli Monitoring database.The Tivoli Data Warehouse Support componentThis component enables the integration of IBM Tivoli Monitoring with Tivoli DataWarehouse. Getting data into the Tivoli Data Warehouse enables production ofmore sophisticated data analysis and the potential of using IBM Tivoli Monitoringdata in SLAs with the use of IBM Tivoli Service Level Advisor. Chapter 3. IBM Tivoli products that assist in service level management 99
  • 3.8 Bringing it all together in support of SLM processes So far this chapter has provided an overview of the IBM Tivoli products involved in supporting the implementation of SLM processes. This section provides a technical description of how you can use these products to support SLM processes implementation. IBM Tivoli products focus on specific areas of expertise and provide a wide range of features unmatched by any other vendor. Together they are well suited to address every stage of the SLM process that is illustrated by Figure 3-10. SERVICE LEVEL BUSINESS IMPACT SLM BSM Analytics Analytics MANAGEMENT Availability Real-Time SLA/OLA/UC Historical Performance Event Management VISUALIZATION Management Automation Reporting Reporting METRICS EVENTS Monitoring Monitoring Monitoring Monitoring MONITORING User Experiences Resources User Experiences Resources Monitoring Monitoring Monitoring Transactions EVENTS Transactions VISUALIZATION IT Services NEGOTIATE AGREEMENTS Relationships User Expectations Business Activity Application Infrastructure IT NO IT IDENTIFY Business Units IT Development IT OperationsFigure 3-10 An integrated view of SLM, BSM, and monitoring in process context How can you integrate the existing Tivoli products to maximize their value in support of the process illustrated by Figure 3-10? Since software products are simply tools in support of processes deployed by an IT organization, and their solutions vary with each IT organization, the following sections outline a generic integration approach that is represented by Figure 3-10.100 Service Level Management
  • The integration approach addresses the following elements: Service definitions Real-time monitoring Historical monitoring Fault management SLA reporting and alerting Problem and change management3.8.1 Service definitions SLM requires an IT organization to establish service definitions by cataloging IT services and identifying resources used by each IT service. Service definitions must reflect the actual relationships between IT services and resources. The real benefit of IBM Tivoli Business Systems Manager comes from the ability to create collections of resources that represent business systems, such as key business processes and applications. Tivoli Business Systems Manager discovers IT resources and relationships and allows an IT organization to construct business systems and map resources and associated events to business systems. Tivoli Business Systems Manager uses two different methods to discover resources and their relationships as they exist in the real world. The first method is a set of explicit discovery routines that periodically scan a particular environment and return the components within that environment. The second method listens for and processes incoming events that signal new resources within the environment and then performs resource creation. Tivoli Business Systems Manager object model maps discover resources and their relationships hierarchically as they exist in an IT infrastructure. This physical resource pool becomes the source for business system construction that enables management by business services. The Tivoli Business Systems Manager object model includes definitions for many of the thousands of different resource types that can be found within an IT infrastructure. Tivoli Business Systems Manager model can be extended to include additional resource types. Business systems can contain any type of resources and be organized in any manner that suits user needs. For example, business systems can model resources within a service, application, geography, area of responsibility, etc. They can be converted into services as required and made available for executive dashboard views and SLA alerting. For information about business systems construction, see 4.2.2, “Basic business system building” on page 119. Tivoli Business Systems Manager provides facilities for off-loading business system information to Tivoli Data Warehouse and later to IBM Tivoli Service Chapter 3. IBM Tivoli products that assist in service level management 101
  • Level Advisor. This information includes business system hierarchical structures and the actual time for each of six states for every business system. IBM Tivoli Service Level Advisor operates based on service offerings that are defined manually and have a set of metrics that is linked to the service while it is created. Important: The practical approach to Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor integration involves the IBM Tivoli Service Level Advisor service offering structures modeled on Tivoli Business Systems Manager services. Therefore, Tivoli Business Systems Manager business system data can be used for more accurate measurement of availability for each defined service offering while IBM Tivoli Service Level Advisor can notify the corresponding Tivoli Business Systems Manager service of the pending SLA violation and trending alerts.3.8.2 Real-time monitoring Tivoli Business Systems Manager accepts data from a a variety of sources including most industry monitoring products. In addition, it accepts data from major scheduling packages, including Tivoli Workload Scheduler. Tivoli Business Systems Manager supports both distributed and mainframe data sources. Tivoli distributed monitors communicate with Tivoli Business Systems Manager either through IBM Tivoli Enterprise Console or directly. Tivoli distributed products monitor resource changes and respond by sending predefined events to IBM Tivoli Enterprise Console. Through IBM Tivoli Enterprise Console rules, these events are then forwarded to Tivoli Business Systems Manager via an agent listener. Tivoli Business Systems Manager also instrumented many adapters for monitoring products that monitor instrumented environments and send resource changes directly to Tivoli Business Systems Manager via a common listener. Monitoring products for distributed platforms deploys several techniques to capture resource changes and generate real-time events, such as log scanning adapters, SNMP managers, and IBM Tivoli Monitoring resource models. Each event is preclassified and assigned the alert state and priority. Tivoli Business Systems Manager also provides an OS/390® adapter for monitoring mainframe environments. It can communicate to Tivoli Business Systems Manager either via IP or SNA protocols. It supports several data feeds such as z/OS, IMS, CICS, DB2, SA/390 automation, storage, WebSphere, network, and batch. The OS/390 adapter can capture console messages and timer based polling events and generate predefined Tivoli Business Systems Manager events.102 Service Level Management
  • Important: Tivoli Business Systems Manager expands real-time event monitoring into real-time monitoring of resource states. It adds value by processing incoming events and recognizing their impact on the state of the corresponding resources. Using the business systems constructs and propagation rules, Tivoli Business Systems Manager combines the states of related resources and allows real-time monitoring of services.3.8.3 Historical monitoring In addition to sending real-time events to Tivoli Business Systems Manager, IBM Tivoli monitoring products collect measurement data. Each monitoring product stores its data in the product database and periodically transfers this historical data into Tivoli Data Warehouse using their WEPs. Tivoli Data Warehouse is a Tivoli product that offers a centralized database for all Tivoli product data. The schemes of this database are open and published. Systems management data from non-Tivoli products can also be integrated. As described in 3.3, “IBM Tivoli Data Warehouse” on page 64, the central data warehouse database uses a generic schema that is the same for all applications. As new components or new applications are added, more data is added to the database. However, no new tables are added in the schema. Historical data, stored in Tivoli Data Warehouse, is aggregated as well as correlated and can be used for reporting by many third-party tools. The latest Tivoli Business Systems Manager WEP provides three enablement options: IBM Tivoli Service Level Advisor integration Tivoli Data Warehouse reporting IBM Tivoli Service Level Advisor integration and Tivoli Data Warehouse reporting Although the Tivoli Business Systems Manager WEP includes programs in support of all three options, the sequence in which the program runs depends on which option is selected. Tivoli Business Systems Manager WEP includes both source and target ETLs. The source ETL loads Tivoli Business Systems Manager data, such as managed resource, events, alert state changes, notes and state transition measurements of business systems, into the central data warehouse database. The target ETL retrieves this data and loads it into the GTM schema in the datamart database. Tivoli Business Systems Manager provides two options for reporting historical data via the same set of reports: Chapter 3. IBM Tivoli products that assist in service level management 103
  • Tivoli Business Systems Manager history server and reporting system that provide Tivoli Business Systems Manager ASP reports Reports available using the Tivoli Data Warehouse reporting interface: Crystal Enterprise Professional for Tivoli Tivoli Business Systems Manager information in the central data warehouse database is also used by IBM Tivoli Service Level Advisor to generate SLA reports. IBM Tivoli Service Level Advisor uses a set of ETLs to extract data from the central data warehouse database to the SLM measurement data mart database for further analysis and reporting. For details about Tivoli Data Warehouse and IBM Tivoli Service Level Advisor data sources, see Chapter 4, “Planning to implement service level management using Tivoli products” on page 109. Each data source has a unique code that identifies the product with which it is associated. Important: Tivoli Data Warehouse facilitates an integration of historical data from Tivoli and third-party products through a centralized database and a set of supported WEP. The main task is to install and schedule these WEPs. Since the size of a database depends on the size of the IT enterprise, it is critical to plan runs and estimate timings for each WEP.3.8.4 Fault management Tivoli Business Systems Manager processes real-time events that are captured from a variety of data sources, stores them in the Tivoli Business Systems Manager database, and posts the appropriate alerts to the corresponding physical resources. Each incoming event has a predefined alert state and priority and is identified with the specific resource instance. Events affect the state of a resource. Tivoli Business Systems Manager propagates state changes upward to affect the resource’s parents and to facilitate the determination of the status of Business views. Propagation is implemented by generating a child event to parent resources. Tivoli Business Systems Manager can regulate propagation through a number of propagation rules. For details about propagation scenarios, see Chapter 4, “Planning to implement service level management using Tivoli products” on page 109. Tivoli Business Systems Manager provides several technologies to visualize resources, business systems, events, relationships, and impact. Tivoli Business Systems Manager supports three types of consoles: Java Console, Web Console, and Executive Console. Each view and console is designed to add value in a particular way. When combined together, they deliver a powerful mechanism for real-time fault management.104 Service Level Management
  • Tivoli Business Systems Manager is designed to manage events in the SLM context through automatic alert propagations to prebuilt and dynamically constructed business systems and services. Tivoli Business Systems Manager events are preclassified by the resource class, alert state, priority, and event type. Most of the defaults can be customized via a GUI, and new resource classes and events can be added. For details about Tivoli Business Systems Manager events and their classification, refer to IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. Tivoli Business Systems Manager provides management facilities, but a customer’s preparedness plays a significant role in achieving effective fault management. Some of the preparation activities are: Identify which events can cause outages; tune Tivoli Business Systems Manager red defaults Identify which events can cause degradation; tune Tivoli Business Systems Manager yellow defaults Consider business impact when constructing business systems Customize alert propagation rules to maximize alert management Find the best use of available views to match operational processes Customers need to classify faults. Tivoli Business Systems Manager red alerts, particularly of critical or high priority, can be classified as faults. Tivoli Business Systems Manager yellow alerts, and perhaps some red alerts of medium and low priorities, can be classified as warnings. Before rolling out Tivoli Business Systems Manager for production, do some preparation. Continuous adjustments and operational training help to improve the effectiveness of fault management and reduce the impact on service levels. Important: A potential outage needs to be fixed as soon as possible to keep SLA attainment. Faults may arrive at a rapid rate and operators must respond to problems based on business impact. Prioritizing faults can greatly improve operators productivity and reduce problem investigation time. Effective use of event, impact, and topology views to evaluate events and their impact are essential to efficient fault management.3.8.5 SLA reporting and alerting Evaluation of SLAs is one of the main functions of the IBM Tivoli Service Level Advisor product. IBM Tivoli Service Level Advisor automates service level assessment against the predefined thresholds and recognizes when SLAs are breached or about to be breached. In addition, IBM Tivoli Service Level Advisor Chapter 3. IBM Tivoli products that assist in service level management 105
  • provides management reports about the actual service levels, SLA violation statistics, and trends toward SLA violations. IBM Tivoli Service Level Advisor depends on the collected performance and availability data from a variety of monitoring and performance tools. This data is stored in the SLM measurement data mart, but all analysis and evaluation results are stored in the SLM database. You can retrieve the analysis data and summarize it into reports that you can view using a Web browser. The SLM report console provides a colorful high level summary report that is displayed in table form, showing totals of trends and violations across the reporting period, grouped by realms and customers. Clicking the table cells invokes accompanying color charts and additional tables of summary information about trends and violations, key operations information, and specific details about particular customers and SLAs. For more details, refer to IBM Tivoli Service Level Advisor SLM Reports, SC32-1248. IBM Tivoli Service Level Advisor analyzes data that is obtained from Tivoli Data Warehouse according to a predefined schedule. This data is evaluated for violations and trends toward future violations of the agreed upon levels of service. Notifications of violations and trends are sent automatically by a way of e-mail, SNMP traps, or IBM Tivoli Enterprise Console events. IBM Tivoli Service Level Advisor performs evaluation of the aggregate data collected from Tivoli Data Warehouse against predefined breach values (for each metric and schedule state periods) to determine if service levels are being maintained. (If the breach value is violated, IBM Tivoli Service Level Advisor generates the violation event.) For example, the breach value defined for total is compared to the sum of all hourly values reported over the entire evaluation period. Accordingly, the breach value for maximum or minimum is compared to the lowest or highest single hourly value. IBM Tivoli Service Level Advisor uses a linear algorithm or exponential stress detection algorithm to analyze existing measurement data and to predict trends toward violations. Both algorithms are active and evaluate the same data for trends according to their methods of evaluation. Due to the iterative estimations and calculations used by the exponential stress detection algorithm, no graphical trend line associated with this algorithm is displayed with graph data. Trend lines that are displayed with graphs are associated with the linear algorithm only. If the predicted value approaches the breach value and if the value is predicted to exceed the breach value by either the linear or the exponential stress detection algorithm, then a trend detection event is reported. If there is an outstanding trend detection event, and the current evaluation value is significantly away from the breach value, a trend cancel event is reported. However, if a violation occurs after the trend detection event, a trend cancel event is never reported.106 Service Level Management
  • IBM Tivoli Business Systems Manager V3.1 introduced the Executive View console, which provides a dashboard approach to presenting a service status to executives. Optionally, a service can show status information for IBM Tivoli Service Level Advisor as the Secondary Impact Information (SII) indicator. SII indicators do not follow the “normal” Tivoli Business Systems Manager status propagation rules. The status of an SLA SII alert is shown by a symbol rather than by a color. IBM Tivoli Service Level Advisor can send SLA trend and violation events to IBM Tivoli Enterprise Console where they are trapped by a IBM Tivoli Enterprise Console rule and forwarded to Tivoli Business Systems Manager via the event enablement and the agent listener. SLA alerts are posted to the corresponding service object and can be viewed in executive console as secondary impact indicators. In addition, SLA alerts can be forwarded automatically to people on the notification list via IBM Tivoli Enterprise Console e-mail and paging facilities. Important: The actual evaluation takes place automatically when the IBM Tivoli Service Level Advisor ETL completes its operation of moving the most recent measurement data from the data warehouse into the SLM measurement data mart. However, IBM Tivoli Service Level Advisor also enables additional advanced settings for intermediate evaluations, frequency of trend analysis, and logging messages for missing data.3.8.6 Problem and change management Tivoli Business Systems Manager provides an integration function to create and track problem tickets. This includes opening and maintaining problem tickets that are stored and processed within a problem management application and automatically creating problem tickets when certain types of messages or exceptions are generated. Another area of integration is creating and tracking change requests. The Tivoli Business Systems Manager integration function is implemented using request processors. A request processor is any program or script that can process command line input parameters, read a text-based input file containing data passed from the Tivoli Business Systems Manager integration function, and create a text-based output file with the results received from the problem or change management system integrated with Tivoli Business Systems Manager. The following types of request processors can be used: Problem request processor: This is any request processor that implements interfaces for entering data and generating requests to create, query, search, find, retrieve, and update problem tickets. The Tivoli Business Systems Manager problem integration function displays the menu options for the BSM Chapter 3. IBM Tivoli products that assist in service level management 107
  • problem ticket processing. Then it transfers control to the user-written program for integration with user’s problem management application. Change request processor: This implements interfaces for entering data and generating requests to create, query, search, find, retrieve, and update change requests. The Tivoli Business Systems Manager change integration function displays the menu options for the Tivoli Business Systems Manager change request processing. Then it transfers control to the user-written program for integration with user’s change management application. Automatic ticket request processor: This is any request processor written by users that can process command line input parameters, read a text-based input file containing the data passed from the Tivoli Business Systems Manager automatic ticket integration function, and create a text-based output file to contain problem ID returned from the problem management application. The automatic ticket integration function differs from the problem and change integration functions within the Tivoli Business Systems Manager product. It does not have a console interface. Its sole function is to create problem tickets and optionally generate automatic notifications by pager or e-mail. The automatic ticket integration function interacts with a user’s request processor when message or exception events are sent to Tivoli Business Systems Manager. All events are processed by the automatic ticket integration function based on predefined automatic ticket event rules that provide criteria for passing the matched events to the request processor. When Tivoli Business Systems Manager console is set up to work with problem and change managements systems, the user can perform the following tasks: Create, find, update, and close problem tickets Two types of create are supported (from the context menu of a resource and from an ownership note) Create, find, update, and close change requests Important: Tivoli Business Systems Manager provides integration functions and request processors for problem, change, and automatic ticketing. Users must develop their own customized programs that can interface their change and problem management systems. Most problem and change management applications provide some type of APIs. After a Tivoli Business Systems Manager request is processed, interface programs must return control to the Tivoli Business Systems Manager exit point and provide notification of results.108 Service Level Management
  • 4 Chapter 4. Planning to implement service level management using Tivoli products The starting point for this chapter is that a decision has been made to implement service level management (SLM) in accordance with IT Infrastructure Library (ITIL) recommendations. Also IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor are used as key parts of the overall solution. The chapter was written from the perspective of an IT consultant assigned to plan and implement a solution. It covers the following topics: An overview of the SLM process introduced in Chapter 2, “General approach for implementing service level management” on page 23, with each stage described in the context of IBM Tivoli products In-depth technical overview of the IBM Tivoli products that are used for SLM In-depth technical description of selected new features of IBM Tivoli Business Systems Manager V3.1 and IBM Tivoli Service Level Advisor V2.1 that are exploited for SLM Brief overview of additional IBM Tivoli products that are used for SLM© Copyright IBM Corp. 2004. All rights reserved. 109
  • 4.1 Implementing SLM using Tivoli products This section reviews the stages of implementing SLM described in Chapter 2, “General approach for implementing service level management” on page 23. It describes each stage in the context of using the IBM Tivoli products introduced in Chapter 3, “IBM Tivoli products that assist in service level management” on page 53. It explains briefly how IBM Tivoli products contribute to each stage of the SLM implementation process. Figure 4-1 illustrates the planning, implementation, on-going SLM program, and improvement process stages.Planning Implementation Established decision to implement SLM Develop service level objectives - Describe services - Determine service level indicators - Determine metrics to be used Define key players: Negotiate on service level agreements - Project Sponsor - Review SLOs with business owners - Service Level Manager - Agree on metrics to be used - Project Manager - Agree on reporting requirements - Business Representatives - IT Representatives Implement SLM management tools - Implementing additional monitoring capabilities - Enhance existing monitoring tools if required - Integrate data collected by monitoring - Implement Business Service management tools Understand the services: - Automate service management - Define services - Establish initial perception of the services - Define expected quality of services Establish reporting function - Periodicity - Recipients - Formats Assess ability to deliver: - Analyze existing infrastructure Adjust IT processes to include SLM - Verify existing monitoring capabilities - Service Support processes - Establish baseline for measurement - Service Delivery processes Improvement Process On Going SLM program Improving quality of service levels Maintenance of services definitions Improving efficiency of SLM SLA management via historical reporting Improving effectiveness of SLM Priority management of real-time faultsFigure 4-1 SLM processes implementation approach110 Service Level Management
  • 4.1.1 Planning During the planning stage, you should become familiar with the capabilities and features of the IBM Tivoli products that are available to you. You must also become familiar with any new products and revise perceptions of existing and installed products. What may now be an under-used event monitor may well become a key tool in SLM. This idea is explored further in “Understanding the services” on page 111 and “Implementing additional monitoring” on page 113. Defining the key players Establish the providers and customers of SLM. Establish who will use SLM tools and their roles. When the users and roles are established, map them to the users and roles provided in IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor. The IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor user roles are described further in 4.2.6, “IBM Tivoli Business Systems Manager roles in an SLM context” on page 132. Practical application of these roles is detailed in the Part 2, “Case study scenarios” on page 195. Understanding the services Understanding the services is a key part of SLM implementation. It is also particularly important to the IBM Tivoli Business Systems Manager implementation. See Chapter 2, “General approach for implementing service level management” on page 23, “Business process-based IBM Tivoli Business Systems Manager business systems” on page 122, and “Data gathering and business system decomposition” on page 134. Assessing the ability to deliver It is important to analyze the infrastructure to assess its capability for providing the services defined in the previous steps. It is also important to know the kind of applications that can monitor various variables of that infrastructure. Refer to Chapter 3, “IBM Tivoli products that assist in service level management” on page 53, for a brief description about some of the Tivoli monitoring applications that are available. At this point, you can define a initial target for the level of service. For example, a service level agreement (SLA) for service A states that it has to be available for 99% of the time with a reporting period of one month. Review this initial target regularly because working toward an obviously unreachable target is unrewarding. You can use IBM Tivoli Service Level Advisor to gather basic metrics for this service. As new feeds and processes are introduced, you can change the SLA to suit the organization’s ability to deliver. Chapter 4. Planning to implement service level management using Tivoli products 111
  • 4.1.2 Implementation The implementation phase is when you install new Tivoli products and review existing Tivoli and other systems management products for SLM. Developing service level objectives After you understand the services, you can begin to define service level objectives (SLOs) for them. You define the SLOs in terms of the information available from the infrastructure. This means that you must base the objectives on what can be measured by the tools that are available. For this reason, review SLO definitions as new monitors are introduced. A new monitor can bring in new metrics that enable a different measurement of a service to be taken. Therefore, we recommend that you review the SLOs. You can different types of metrics: external and internal. When developing SLOs, it is important to differentiate between internal and external metrics. External metrics are defined in the SLA contract. They are visible to the customer. An example of an external metric is Overall Response Time of Service. Internal metrics are accessory metrics from system monitors that can be used by the service provider in a proactive manner to ensure that the contract is being met. Internal metrics are not shown to the customer and are not part of the SLA contract. An example of an internal metric is Response time of DB2 Databases used by the Application. Negotiate on service level agreements After you develop the SLOs, negotiate the SLA. As in any negotiation, it is important that you have all the information available for this important step. The most important information is the current level of the service based on the metrics that were chosen in the previous step. You obtain this information by evaluating the historical data. Assuming that the monitor applications have been collecting information from the infrastructure for some time, you can use the IBM Tivoli Service Level Advisor function to retrospectively see how you are doing.To see how to implement this, refer to 4.4.1, “Building SLAs in IBM Tivoli Service Level Advisor” on page 156. After the negotiation, you may want review and adjust the SLA that was created.112 Service Level Management
  • Implementing additional monitoringThis is an extremely critical stage and prerequisite for SLM. It covers thefollowing tasks: Increase the rollout of existing systems management tools to cover gaps in monitoring. The business process decomposition may reveal gaps in monitoring. Ensure whether these can be filled by your existing systems management tools. Re-assess, re-invent and exploit existing systems management solutions to cover gaps in monitoring. This is an extension of the previous task. Most systems management tools have features and functions that are not exploited. Re-assess all the existing systems management tools to see if further exploitation can be done to cover the monitoring gaps. Review and re-engineer existing systems management solutions to ensure event quality. IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor can only be as good as the information that is sent to them. If every event, trivial or critical, sent by the monitors is marked as critical, then there is no way to truly assess the business impact of the events. Every business system is marked as critical, and the management of the business processes will be essentially blind. It is imperative that events sent from the monitors reflect the true severity of the event on the component, conform to message ID standards and, ideally, have a corresponding goodness event to close the original event if the bad situation no longer applies. It is often substantial work to standardize events, but it is a necessary work if SLM is to be successful. Implement new IBM Tivoli Monitoring products to cover gaps in monitoring. Some of the monitoring gaps may not be covered by the existing systems management skills or products. Use IBM Tivoli Monitoring products to cover the remaining gaps. Examples are: – IBM Tivoli Monitoring – IBM Tivoli Monitoring for Database – IBM Tivoli Monitoring for Business Integration – IBM Tivoli Monitoring for Web Infrastructure These products measure the internal performance of systems and applications. The functionality includes continuous monitoring and recording of information, raising alerts when thresholds are exceeded, and gauging user experience by making response time measurements. These products can monitor hardware databases and applications.Chapter 4. Planning to implement service level management using Tivoli products 113
  • Implement IBM Tivoli Monitoring for Transaction Performance to provide user-experience monitoring. User experience monitoring is key to providing an end-to-end view of a service. Implementing and exploiting IBM Tivoli Monitoring for Transaction Performance is explained in 4.5.1, “IBM Tivoli Monitoring for Transaction Performance” on page 190, and in Part 2, “Case study scenarios” on page 195. Implementing SLM analytical and automation tools This is the actual implementation stage of IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor. In this stage, you also implement any required supporting tools such as Tivoli Data Warehouse and IBM Tivoli Enterprise Console (TEC). Details of implementation are covered in Part 2, “Case study scenarios” on page 195. Establishing a reporting function Reports in this solution are on demand. You can request them to see the status of the services at any point in the evaluation period. The main task here is to define the various users and the access they have to the information in the solution. For details about how to do this, see “Reports” on page 164. After you create the users, check the available IBM Tivoli Service Level Advisor reports to ensure that the users can see what they need to see. For examples of the views that are available to the various users and roles, see Part 2, “Case study scenarios” on page 195. Adjusting IT processes to include SLM Sometimes it is necessary to revise operational processes and practices to ensure that SLM data is accurate. An example of this is to ensure that the state of the system or application is not considered during maintenance period because it may affect its over all availability. Another example is to revise the change process as required. This ensures that the SLM tools are included in the scope of changes so that business systems and SLAs can be changed accordingly.4.1.3 Ongoing SLM program This task covers continuous monitoring, reporting, and reviewing of the SLAs. The main idea here is to be proactive and identify possible problems in the infrastructure before they impact the SLA at the end of the evaluation period.114 Service Level Management
  • Many IBM Tivoli Service Level Advisor capabilities can be used for this. Trends toward violations IBM Tivoli Service Level Advisor calculates trending toward violations for any metric selected to be part of an SLA. It analyzes the data for the metric and sends a trend event when the algorithm detects that the data shows a linear or stress exponential trend that may violate within a predetermined interval. See Chapter 5, “Case study scenario: IRBTrade Company” on page 197, for an example. Intermediate evaluations These evaluations are done more frequently than the report one. A common situation is a monthly evaluation and a daily intermediate evaluation. With this, the IT organization can check everyday on the status of the various services it is providing and take action while it is possible to affect the SLA at the end of the month. For details about this function, refer to Part 2, “Case study scenarios” on page 195. Adjudication In some situations, some violations will happen in conditions that, according to the SLA contract, can be adjudicated. An example of this is when the number of users, who are using a certain application, exceeds what was in the contract, so the violation for the month can be adjudicated. Refer to “Adjudication” on page 170 for details.4.1.4 Improvement process SLM is a continuous process, and improvement opportunities do not end. Reviewing service requirements changes As mentioned earlier, it is important that changes to the environment are reflected in the SLM tools. You can use IBM Tivoli Business Systems Manager to enhance change requests and should be closely involved in planning service changes. By using the Business Impact view on an object within IBM Tivoli Business Systems Manager, it is possible to see every business process that can be affected by the change and manage the change accordingly. Changes to services that require new components to be added should ensure that the new components are added to the IBM Tivoli Business Systems Manager business system before or when the change becomes active. If a new component is added before it becomes live, use the IBM Tivoli Business Systems Manager Maintenance function to suppress event propagation from the object while it is in test. This function is described in IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. Chapter 4. Planning to implement service level management using Tivoli products 115
  • Decommissioning resources is not reflected in IBM Tivoli Business Systems Manager. A decommissioned object remains in the business system and no longer receives events. These decommissioned objects from business views have no effect on continued IBM Tivoli Business Systems Manager function. They can be cleaned up as a maintenance function to avoid having too many decommissioned objects. You can use Automatic Business Systems (ABS) and Extensible Markup Language (XML) Business System building to ensure that changes to the service are reflected in IBM Tivoli Business Systems Manager. Failure to reflect service changes in IBM Tivoli Business Systems Manager reduces the effectiveness of SLM. Continued failure compromises SLM and renders the monitoring and metrics useless. Reviewing and adjusting SLOs and SLAs An SLA should have a periodic SLA review defined into the SLA contract. During the periodic review period, you can make time changes to the SLA to accommodate changes to the service without distorting the measurements. Examples of changes include: Changing breach values to accommodate new needs This can be the result of a review, where more powerful resources were requested and the breach values were changed to reflect a higher level of service. For details, see Part 2, “Case study scenarios” on page 195. Metrics Review the metrics that make up the SLO so that the value of the SLO is more tangible to the receiver of the service. Maintenance period Set up new maintenance periods. You must change the schedule to accommodate new maintenance dates. See “Maintenance schedule” on page 175. Making adjustments Replacements and improvements to resources may be necessary to maintain or reach the desired adequate level of service. Also, there may be cases when the service levels desired are unrealistic based upon the existing infrastructure and costs. In this case, adjust SLAs accordingly. To implement this, see “Changes to service level agreements” on page 169.116 Service Level Management
  • Improving the SLM processes The SLM process includes continuous evaluation and improvement. Areas of improvement include: Changing the intermediate evaluation frequency Reducing the time to implement a change that can affect the SLA evaluation outcome Changing the number of people monitoring the SLAs Adjusting separate SLA responsibilities per business unit Creating customized Microsoft Excel reports Adding more internal metrics to improve diagnostics, trends, or management4.2 IBM Tivoli Business Systems Manager V3.1 IBM Tivoli Business Systems Manager is IBM’s core business systems management product. This section introduces IBM Tivoli Business Systems Manager and provides a high-level overview of some IBM Tivoli Business Systems Manager concepts and features. It also provides in-depth examples of several IBM Tivoli Business Systems Manager features now in Version 3.1. IBM Tivoli Business Systems Manager provides a common management console for users and roles across the enterprise from operations, through technical specialists and service management right up to executives. It provides operations with a view of system components as they relate to the business. It also provides service management and executives with a high level view of the status of predefined services across the enterprise. IBM Tivoli Business Systems Manager receives systems management information from a large range of monitoring products on both z/OS and distributed systems. Plus it integrates with TEC and most IBM Tivoli Monitoring products to provide the ability to build consolidated views of the enterprise. IBM Tivoli Business Systems Manager uses data structures called business systems. Business systems are built from objects defined to IBM Tivoli Business Systems Manager. Objects represent instances of the enterprise hardware and software components. Business systems can be built as models of actual business processes. Systems management tools pass events to IBM Tivoli Business Systems Manager. These events are mapped to the actual object affected by, or that is issuing, the event. If the object is a component of a business process and it is built into a business system, then the received event is overlaid onto the object in the business system. This gives operations a graphical representation of the business process and the context of the event that is affecting it. Chapter 4. Planning to implement service level management using Tivoli products 117
  • An event that affects a core business process causes the business system to be overlaid with a red or yellow icon (see following section) indicating the impact on the business process of the event. A similar event that affects a non-critical component does not light up the business system. Because IBM Tivoli Business Systems Manager graphically shows the event in the correct context, you can judge the impact and direct resolution efforts accordingly.4.2.1 Propagation, alerts, and events Events posted to IBM Tivoli Business Systems Manager set the receiving object to have an alert state and priority. An alert state of an object is its color: red, yellow, or green. Priority of an object is an indication of its severity. The range and order of oriorities is: Critical High Medium Low Ignore Inherit from event The default priority for objects is inherit from event. This causes the object to be overlaid with the alert state and priority carried by the received event. Where many exceptions are sent to an object, the object’s alert state and priority are set by the highest received event. The combination of alert state and priority means that IBM Tivoli Business Systems Manager can have many different event types. The practical range of events that are used by IBM Tivoli Business Systems Manager is from low yellow to critical red. Each different alert state and priority combination in the practical range can be treated differently by individual objects in IBM Tivoli Business Systems Manager. The Alert State and Priority of an object determine the propagation of events sent to it. Propagation is the process of overlaying received events onto an object and, if required, sending the event further up the business system tree. If the event is propagated up the tree, then it is considered to be a child event to the objects further up the tree. Propagation settings are customizable at object level. See “Resource level propagation” on page 136 for more details. IBM Tivoli Business Systems Manager has two types of events that it can post to objects: messages and exceptions. Messages are state changes. A object can be only in one state at a time, such as Up. A stage change changes the state of the object so that it becomes another state, such as Abended. Similarly only one message can apply to an object at any time. Message are often, but not exclusively, state change events that set the status of the object. Messages are118 Service Level Management
  • never cleared but are overlaid with other messages of the same or greater priority. For example, a high red message is overlaid with a high green message, sending the affected object to a green alert state. Exceptions are more flexible. Any number of exceptions can apply to a single object. Most events from system management tools are posted as exceptions by IBM Tivoli Business Systems Manager. Exceptions are not overlaid by other exceptions unless the exception has an identical exception ID. In that case, the exception count increments. Outstanding exceptions can be cleared automatically when the problem is resolved by sending the same exception with the exception text of OK. For details about message and event handling, see IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085.4.2.2 Basic business system building This section discusses the available methods of building business systems. Drag and Drop Drag and Drop business system creation is quick and easy to use. However large and complex business systems are time consuming to build using Drag and Drop. Up to 20 objects can be dragged and dropped at one time. Drag and Drop is a good method for building complex business systems in environments where naming standards cannot be relied upon (see the following section). However Drag and Drop Business Systems do not automatically update for newly discovered objects and present a constant maintenance overhead. Drag and Drop business systems have their uses. We recommend that, for production implementations where the currency of business systems is critical, use ABS and XML for business system building. Automatic Business Systems Automatic Business Systems (ABS) has been available in IBM Tivoli Business Systems Manager since Version 2.1. IBM Tivoli Business Systems Manager V3.1 contains extra enhancements for ABS that allows it to exploit the new features of IBM Tivoli Business Systems Manager V3.1 such as resource level propagation and executive dashboard. ABS requires you to know the design of the business system up front because configuration is required to define ABS builds. ABS relies heavily on attribute naming conventions and cannot be easily achieved if naming standards are not consistent. Chapter 4. Planning to implement service level management using Tivoli products 119
  • ABS-created business systems are dynamically built and populated with all qualifying existing objects as defined in the ABS rules. Maintenance is especially low for keeping business systems up to date since newly discovered and created objects are automatically placed in business systems by ABS. For instructions on using ABS, see IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. XML XML-built business systems are a new component introduced in IBM Tivoli Business Systems Manager V3.1. This feature allows business systems to be built and updated using XML and to be extracted and backed up as XML files. The XML method was not used for this IBM Redbook. You can learn more about this method in IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085.4.2.3 Best practices for business system building Building effective business systems is an iterative process. The best practice is to use ABS, XML, or both wherever possible to reduce maintenance overhead. Business system building can produce a brief performance overhead on the IBM Tivoli Business Systems Manager system. This is normally minimal and not noticeable to IBM Tivoli Business Systems Manager users. However, use consideration when implementing large ABS or XML business systems since the initial business system population may impact users. Business systems can be nested up to six levels, the maximum. Fewer levels are better since extra nesting levels increases the propagation workload. We recommend that you do not nest a business system under another copy of the same business system. Business system names are important. ABS uses business system names as the main reference for building business system structures. Duplicated business system names cause unpredictable ABS results. Business System Shortcuts In previous versions of IBM Tivoli Business Systems Manager, you could produce many copies of the same business system and make a business system a child of it. This was an undesirable situation that created many performance problems. In IBM Tivoli Business Systems Manager V3.1, Business System Shortcuts (BSS) are introduced to control the number of copies of business systems.120 Service Level Management
  • BSS are copies of a parent business system. The objects in the BSS are the same objects as in the parent business system. They are not duplicates. Most of the properties of the parent BSS are inherited by the BSS, but you can change these properties in the BSS. If you change the parent’s properties, then the change is reflected in the children BSSs. You can unlink the properties of a child BSS and change them to suit the requirements placed upon the BSS. If required, you can relink the child’s properties back to the parent so that the child has the parent’s properties once again. Some properties are not inherited by the child BSS. A business system that is defined as an Executive View Service does not automatically pass on this property to a child BSS. We used BSS to allow different propagation rules to apply to the same business system so that different roles can get different information from the same business system structure. Chapter 6, “Case study scenario: Greebas Bank” on page 315, offers more information about exploiting BSS.4.2.4 IBM Tivoli Business Systems Manager business system types IBM Tivoli Business Systems Manager supports two types of business systems: technology based and business process based. Both types are identical in behavior but differ in ease of build and use. Technology-based IBM Tivoli Business Systems Manager business systems The simplest business system to build in IBM Tivoli Business Systems Manager is a technology-based business system. It contains objects of the same object type, representing one technology, such as CICS regions, Windows 2000 servers, or DB2 databases. Figure 4-2 shows an example of a technology-based business system. It is simply built by including all required CICS region objects under the parent BSV folder. This is done by using ABS rules, XML BSV definition, or Drag and Drop. Technology-based business systems are particularly easy to build using ABS because they are built by including all instances of the same object type regardless of the name. This process can be done for any technology tower that exists as an object type within the IBM Tivoli Business Systems Manager (TBSM) database. Chapter 4. Planning to implement service level management using Tivoli products 121
  • Figure 4-2 Example of technology-based TBSM business system view Business process-based IBM Tivoli Business Systems Manager business systems A business process-based IBM Tivoli Business Systems Manager business system has a more complex construction than the technology-based business system. It is effectively a model of a real business process with all IBM Tivoli Business Systems Manager objects representing all the monitored components of the real business process.122 Service Level Management
  • Figure 4-3 shows a schematic diagram of a business process business system. Itshows the business process broken down into functions and the functions brokendown into applications. The applications are made up of aggregations oftechnologies, such as servers and databases. Underneath the aggregation layeris the technology layer that represents the actual hardware and software. Themonitors layer shows the feeds that go into IBM Tivoli Business SystemsManager. It does not represent components of the IBM Tivoli Business SystemsManager business system.Figure 4-3 Business process-orientated business systemOne of the most challenging parts of IBM Tivoli Business Systems Managerimplementation is correctly identifying the components that make up thebusiness process. Processes for gathering the necessary business processinformation are discussed in Chapter 2, “General approach for implementingservice level management” on page 23, and in “Data gathering and businesssystem decomposition” on page 134.Chapter 4. Planning to implement service level management using Tivoli products 123
  • This type of business system can be built by using ABS. However the objects within scope must conform to naming standards so that they can be correctly placed by ABS. You can use XML to build the business system. This method is especially effective if you can obtain an XML extract of the component from a federation of monitoring databases or some other repository that contains details about the business process. Figure 4-4 shows an example of a business process-based business system. For clarity, this view is only partially-expanded. Figure 4-4 View of business process-based business system124 Service Level Management
  • 4.2.5 IBM Tivoli Business Systems Manager views in an SLM context IBM Tivoli Business Systems Manager has many different views available to users. This section discusses the most popular views and how you can use them in the context of SLM. Tree view The IBM Tivoli Business Systems Manager tree view is the base view of IBM Tivoli Business Systems Manager. The Business Systems view and All Resources view are in tree format and all business systems open as a tree view by default. The tree view is useful for the administrator to manipulate logic within the business system structure. The tree view is less useful for operational management of the components in the business system. Refer to Figure 4-4 to see the partially-expanded tree view of a business system. Event Viewer For users to quickly use and understand IBM Tivoli Business Systems Manager, the tree view can be enhanced with the IBM Tivoli Business Systems Manager Event Viewer. Figure 4-5 shows the IBM Tivoli Business Systems Manager Event Viewer for CICS events. Figure 4-5 Using the IBM Tivoli Business Systems Manager Event Viewer The IBM Tivoli Business Systems Manager Event Viewer shows events in the linear way similar to traditional systems management tools. This enables users to use IBM Tivoli Business Systems Manager quickly, without having to change working practices to adapt to IBM Tivoli Business Systems Manager. Note that, in Figure 4-5, the columns were resized and rearranged to make the view of Chapter 4. Planning to implement service level management using Tivoli products 125
  • events more user friendly. From this view, users can take ownership of events, close out unnecessary events, and see who owns existing events. Hyperview Hyperview is a dynamic, real-time view of an exploded business system. This view offers a quick overview of a business system. Because the hyperview always centralizes on a click of a user’s mouse, it is a volatile view and can accidently obscure events in the hyperview. Figure 4-6 shows a hyperview for a business system. The default for hyperview is a minimum alert state of green. This means that every object is shown. We recommend that you change this default because the console display becomes too busy.Figure 4-6 Hyperview set to show the minimum alert state of green126 Service Level Management
  • Topology viewThe topology view is automatically built from business systems. It can be used todisplay a business system and its components or simply the high level icon forthe business system.Where the hyperview is volatile, the topology view is static. Both views are realtime and display events as they are received.Figure 4-7 shows the same business system as shown earlier, but this oneshows the general topology view. This option is available to show all details as inthe hyperview, but the icons shrink as the view expands and the desktopbecomes more difficult to use.Figure 4-7 Topology view of business system: Not all detail enabledChapter 4. Planning to implement service level management using Tivoli products 127
  • IBM Tivoli Business Systems Manager also provides complex topology views for some mainframe feeds, such as CICS, IMS, and DB2. Technical support teams can use these views. For IBM Tivoli Business Systems Manager V3.1, IMS and DB2 topologies are new and the CICS topology view no longer requires CICSplex to be implemented. See Figure 4-8. Figure 4-8 Sample IMS topology view For details about exploiting the IBM Tivoli Business Systems Manager topology view, see IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. Work spaces The IBM Tivoli Business Systems Manager console can consist of several windows that contain any or all of the previously mentioned views. The IBM Tivoli Business Systems Manager administrator typically creates a set of views that are suitable for a role such as an operator or a database specialist. The administrator then saves the set of views in a work space. A work space can be assigned to specific operator and restricted operator IDs so that only these users can see the views. The administrator can also set work spaces to open on console startup. Most IBM Tivoli Business Systems Manager windows examples in this document show work spaces. Figure 4-9 shows an example work space set up for three128 Service Level Management
  • business systems using an Event Viewer in another window overview of all three business systems.Figure 4-9 Sample work space using three topology views and Event Viewer Web Console For IBM Tivoli Business Systems Manager 3.1, the Web Console was redesigned and introduces improved authentication using IBM WebSphere. It is a functional Web console based on Java that can be used by defined users to manage business systems and events. Some Java console functions, such as hyperview and the topology view, are not replicated in the Web Console. However, business system management is still easily achieved without these features. The Web Console introduces the Critical Watch List (CWL).This is an administrator-defined list of business systems and individual resources that are kept on the user’s Web Console. From the CWL, a user can see events that are Chapter 4. Planning to implement service level management using Tivoli products 129
  • posted to a business system and can drill down, assess the business impact and take ownership of the event. Actions taken on the Web Console are reflected in all other console types so that, for example, an event owned by a Web Console user, shows as being owned in the Java console and the executive dashboard. Figure 4-10 shows a sample Web Console showing a CWL for a user with the operator role.Figure 4-10 IBM Tivoli Business Systems Manager Web Console Executive dashboard The executive dashboard is a new concept for IBM Tivoli Business Systems Manager 3.1. The executive dashboard is designed to inform senior managers of overall service status without providing technical detail that is not necessary to that level of user. An executive dashboard user can be notified of service status and SLA status but is not notified of problems and incidents that are not impacting the business process. The user can see that a business process is impacted and that the causing incident is being owned and managed. The user can also see when an SLA is trending toward violation and when an SLA is violated. The executive dashboard enables senior management to be aware of business process status without forcing unnecessary training and information onto them.130 Service Level Management
  • The executive dashboard is a non-intrusive console that can run minimized on adesktop. It is Web-based and accessible via a Uniform Resource Locator (URL)and does not require any code installation on the desktop.There are two levels of executive dashboard user: executive and IT executive.The executive-level user is shown only the highest level of alerts and sees onlynon-technical messages. The IT executive-level user is expected to be used bymore technically-aware managers. Therefore IBM Tivoli Business SystemsManager provides more technical detail to supplement the high-level alertinggiven to the executive-level user.Figure 4-11 shows an executive dashboard that is seen by both executive and ITexecutive users.Figure 4-11 Executive dashboard: One service in yellow statusChapter 4. Planning to implement service level management using Tivoli products 131
  • Figure 4-12 shows the different information made available to each user. The dashboard on the left is for the executive user and shows service status. The dashboard on the right is for the IT executive and shows details about the affected resource. Executive User IT Executive UserFigure 4-12 Comparison of drill-down information available to each role4.2.6 IBM Tivoli Business Systems Manager roles in an SLM context IBM Tivoli Business Systems Manager V3.1 has the following user roles available. Each role has privileges and functions that enable users to perform the responsibilities assigned to them. The available roles are: Super administrator Administrator Operator Restricted operator IT executive Executive For a full list of functions and privileges available to each role, see IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. The following section discuss the roles in an SLM context. This is explored further in the practical scenarios covered in Part 2, “Case study scenarios” on page 195. Administrator and super administrator The IBM Tivoli Business Systems Manager administrator roles are not directly relevant to SLM. That is, administrator users are responsible for administering IBM Tivoli Business Systems Manager views and users rather than SLM.132 Service Level Management
  • However, the administrator role is responsible for developing the businesssystems and views used by other roles to aid SLM.Super administrators can create and administer CWLs for the Web Console andthe equivalent in Java Console, which is Critical Resource Lists (CRL). CRLs arenot widely used but are detailed in IBM Tivoli Business Systems ManagerAdministrator’s Guide, SC32-9085. The administrator cannot perform this task.Other than this, the two roles are identical.The IBM Tivoli Business Systems Manager administrator should work closelywith the IBM Tivoli Service Level Advisor administrator. This is so that thedefinition of IBM Tivoli Business Systems Manager Services as IBM TivoliService Level Advisor Services can be properly coordinated. See “Marking anIBM Tivoli Business Systems Manager business system as a service” onpage 187 for more details.OperatorThe operator is responsible for monitoring the whole or parts of the enterprise.This person needs to see all severities of events that affect components of theenterprise. It is good practice to send only events for service level managedresources to operators. Sending events from non-SLM resources can bedistracting to operations and divert attention from SLM resources.If a system has an SLA, send events to operations so that the system and theSLA can be managed. If a system has no SLA, then operations should not spendeffort on resolving events for it.Restricted operatorThe restricted operator is the same as the operator with additional restrictions.That is the restricted operator cannot view all business systems nor addresources to their own CRLs.IT executives and executivesIT executives are IBM Tivoli Business Systems Manager roles created especiallyfor SLM. This user ID is an executive Web Console user. Therefore, this personreceives IBM Tivoli Service Level Advisor events overlaid onto the relevant IBMTivoli Business Systems Manager business system.The executive IT user receives service status from the business system icon andIBM Tivoli Service Level Advisor statuses for the service on the IBM TivoliService Level Advisor icon. They receive detail about the impact of an event aswell as the event itself.Chapter 4. Planning to implement service level management using Tivoli products 133
  • The executive user also receives service status from IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor. However this user does not receive details about events. Note: These user IDs do not have access to the other IBM Tivoli Business Systems Manager consoles. See “Executive dashboard” on page 130 for details and examples about the executive dashboards.4.2.7 Understanding your services IBM Tivoli Business Systems Manager requires models of real business processes to be built as business systems. To do this successfully, the business processes should have all details made known and, wherever possible, be fully monitored. This section extends the discussions started in Chapter 2, “General approach for implementing service level management” on page 23, about gathering the necessary information to build a business system. Data gathering and business system decomposition Figure 4-3 on page 123 shows a schematic for a business system. To build a business system, the IBM Tivoli Business Systems Manager customer must know the information about the structure of the business process. This information must be made available to the IBM Tivoli Business Systems Manager administrator so that this person can build the business system. Many business process owners do not know enough about the components that make up their business, and a cycle of business process decomposition has to be performed. This process is not quick or simple and often relies on interviewing many people to extract the necessary information across all of the technologies. See Chapter 2, “General approach for implementing service level management” on page 23, for more details about this process. Some work must be done and the information made available to partially map and model a business process. It is possible to have a partially-complete business system that enhances management of a business process. Although this situation is not ideal, 80% of a business system is far better than no business system at all. The components that are in the IBM Tivoli Business Systems Manager Business System still receive events and show the effect of the event upon the business process. The problem is that not all of the business process is represented in IBM Tivoli Business Systems Manager. Therefore, there is a risk of a service-impacting event not being reported to IBM Tivoli Business Systems Manager. This can134 Service Level Management
  • damage the credibility of both IBM Tivoli Business Systems Manager and theBSM approach. However, using IBM Tivoli Business Systems Manager with theawareness that not all the business process is covered still gives great value forthe parts of the business process that are covered by IBM Tivoli BusinessSystems Manager.Monitoring gaps can be overcome by using customer-experience software, suchas IBM Tivoli Monitoring for Transaction Performance, to report on the end-to-endperformance of the business process. It is important that the remainingcomponents of the business system are discovered and defined to IBM TivoliBusiness Systems Manager as soon as possible. See “Implementing additionalmonitoring” on page 113 for an overview of the methods to fill in the gaps.Enhancing monitoringBusiness process decomposition frequently shows monitoring gaps. These occurwhen some components of the business process are not under the control of asystems management tool or organization. This is a common occurrence that isdifficult to quickly overcome. It can be possible to plug gaps with existing systemsmanagement tools and then integrate them into IBM Tivoli Business SystemsManager. However often there are going to be gaps in the end-to-end monitoringof the business process.It can be argued that an early benefit of IBM Tivoli Business Systems Manager isthat it drives the customer to discover gaps in their monitoring. Regardless of theBSM tool that is used, gaps in the monitoring of a business process areundesirable and should be closed as soon as possible. For large monitoringgaps, a delay to IBM Tivoli Business Systems Manager implementation shouldbe considered while the gaps are filled.There are situations where a large part of the business process is not monitoredbecause it is outside of the remit of the customer. A common example of this iswhen the network is out sourced. It is not desirable to bring network monitoringback in house for IBM Tivoli Business Systems Manager, because then both thenetwork providers and the IBM Tivoli Business Systems Manager users monitorthe network.If you prefer to have end-to-end monitoring and want to include the network, werecommend that you use IBM Tivoli Monitoring for Transaction Performance V5.3to replay transactions and measure the network latency. Any severe networklatency in the sample transactions can be reported to IBM Tivoli BusinessSystems Manager. For details about IBM Tivoli Monitoring for TransactionPerformance network latency measurements, see IBM Tivoli Monitoring forTransaction Performance V5.3 Administrator’s Guide, GC32-9189.Chapter 4. Planning to implement service level management using Tivoli products 135
  • 4.2.8 Using IBM Tivoli Business Systems Manager 3.1 features for thebenefit of SLM Of the many new features in IBM Tivoli Business Systems Manager V3.1, two of the most useful ones for effective SLM are resource level propagation (RLP) and percentage-based thresholding (PBT). Resource level propagation RLP is a new feature of IBM Tivoli Business Systems Manager V3.1. In previous versions of IBM Tivoli Business Systems Manager, propagation threshold changes affected every instance of an object type. In IBM Tivoli Business Systems Manager V3.1, RLP is available and can be used to change the propagation behavior at object level rather than at type level. RLP allows an administrator to set exception and child event thresholds for individual object instances. An administrator can use it to ensure that propagation behavior can be controlled at object level so that a business system can be customized exactly to suit requirements. When RLP is carried out, the administrator sets the RLP settings for child events for an object so that the events from objects further down the tree do not propagate onto the object. This is explained in “Defining rules for the scenario” on page 140. Figure 4-13 shows an example of RLP definitions for the child events of an object named ATM Network. The definitions allow propagation for these situations: Propagate any yellow event. Propagate the seventh low red event received from child objects. Propagate the fifth medium red event received from child objects. Propagate the third high red event received from child objects. Propagate all critical events.136 Service Level Management
  • Figure 4-13 RLP set for red child events onlyPercentage-based thresholdingWith the PBT method, a group of immediate, weighted, child resources aremonitored by rules. When a percentage of these resources have an alert state(such as red), a preconfigured event is sent to the parent object where the PBTrules are set.PBT rules are triggered when the following formula is satisfied:%age_Min =< ((Alert_Weight / All_Weight) x 100 ) =< %age_MaxIn this formula, note the following explanation: %age_Min: The lower limit of the PBT rule percentage range Alert_Weight: The total weight of resources in the desired alert state (for example, red) All_Weight: The weight of all resources in the scope of the PBT rule %age_Max: The upper limit of the PBT rule percentage rangeChapter 4. Planning to implement service level management using Tivoli products 137
  • A simple illustration is where four objects are covered by a rule. The objects each have a weight of 25 and the rule has to fire when three of the objects are red. Three red objects is 75%, so the rule fires when 75% of the objects are red. We set the range from 51% to 76% so that the rule doesn’t fire when two or four objects are red. This gives us the following values: %age_Min = 51 (more than two reds) Alert_Weight = 75 (three reds) All_Weight = 100 (all four resources) %age_Max = 76 (less than four reds) The formula is: 51 =< ((75 / 100) x 100) =< 76 TRUE If only two objects were red, then the formula is: 51 =< ((50 / 100) x 100) =< 76 FALSE For a practical run through PBT, see 4.2.9, “Using PBT and RLP to manage high availability scenarios” on page 139, and Chapter 6, “Case study scenario: Greebas Bank” on page 315. Before you can use PBT, you must enable it for use by the IBM Tivoli Business Systems Manager Administrator. You do this using the Administrator Preferences option (see Figure 4-14). After PBT is enabled, you see the Propagation tab in an object’s properties window.138 Service Level Management
  • Figure 4-14 Enabling resource level propagation4.2.9 Using PBT and RLP to manage high availability scenarios Using PBT and RLP together enables the administrator to customize business systems to suit specific user roles and preferences. Chapter 5, “Case study scenario: IRBTrade Company” on page 197, and Chapter 6, “Case study scenario: Greebas Bank” on page 315, detail a practical exploitation of these features to control which role sees which event in a business system. As an introduction to RLP and PBT, we provide a simple scenario where we use RLP and PBT together to manage a set of high-availability servers. In this scenario, there is a business system of four servers. The servers function together as high availability load-sharing servers. All four servers perform the same role. However peak throughput of work on the servers is equal only to two Chapter 4. Planning to implement service level management using Tivoli products 139
  • servers running at full capacity. The extra servers are provided for redundancy and service resiliency and to spread the workload across the all servers. Due to the over capacity of the servers, up to two servers can be impacted by red events before there is a likelihood of the service being degraded. If three servers are impacted, there is a risk of service degradation because all the work is likely to be performed by one server. If all four servers are impacted, the service is severely impacted and possibly down. In this scenario, we use RLP to ensure the following criteria: Any red or yellow objects: Show alerts on affected objects. Up to two red or four yellow objects: Don’t propagate to the PBT Demo business system. Three red objects: Propagate a yellow alert to PBT Demo. Four red objects: Propagate a red alert to PBT Demo. Remove PBT alerts when only two red alerts remain on objects. This scenario demonstrates two desired event behaviors that are now possible with IBM Tivoli Business Systems Manager V3.1: Managing redundant groups Sending a yellow event from receiving red events Set PBT and RLP against this business system Figure 4-15 PBT Demo business system Defining rules for the scenario To set up the necessary RLP and PBT settings to satisfy the previous scenario, you follow these stages as explained in the following sections: 1. Set RLP to stop child events from propagating. 2. Create PBT rules for four red objects and three red objects. 3. Create a clearing rule for two red objects.140 Service Level Management
  • Setting RLP to stop child events from propagatingFrom the redundancy business system, you go to the Redundancy Propertieswindow and click Child Events in the left panel, as shown in Figure 4-16. Thenyou set all thresholds to 100. In doing so, the threshold far exceeds the number ofchild objects and so it is never reached. This stops events from the child objectspropagating up to this business system and beyond.Figure 4-16 Using RLP to stop child event propagationUsually, you must set the RLP at the level directly above the objects that are to bemanipulated by RLP. If we set RLP at the PBT Demo business system, then theonly child events that can propagate to this business system would have apriority of Critical.Creating PBT rules for four red objects and three red objectsYou must set the PBT threshold rules at one level above the objects that areaffected by the PBT rules because the scope of the PBT rules is the objects inthe next level down the tree. In this case, you set the rules against theredundancy business system.Chapter 4. Planning to implement service level management using Tivoli products 141
  • You start with the easiest rule to define, which is to send a red event when all four objects are red. Each object represents 25% of the total, so the percentage criteria to satisfy this rule is to have between 76% and 100% of in-scope red. The rule only fires when all four objects are red. See Figure 4-17. It is equally correct for this rule to specify 100% as both the minimum and maximum percentage. However, for more complex PBT rules, it helps to ensure that the rules cover all situations so that all percentages are covered. As the math becomes more complex, the need to ensure that all percentages are covered by rules increases. Figure 4-17 Rule 1: Severe impact142 Service Level Management
  • This rule sends a critical red event when its criteria is satisfied. The event isposted against the redundancy business system object. Because this event isposted against the actual object, it is not a child event and so is not affected bythe RLP settings done previously. The RLP settings only affect child events. Theposted event is also propagated to the PBT Demo business system as desired.The second rule covers the situation of three red child objects. The percentagerange of this rule is between 51% and 75%, so it fires only when three of the fourobjects have a red event against them. See Figure 4-18. Three red events causea yellow event to be posted to the redundancy business system object and up toPBT Demo as desired.Figure 4-18 Rule 2: Service degradedChapter 4. Planning to implement service level management using Tivoli products 143
  • The ability to send a yellow event on receipt of red child events adds a lot of flexibility to IBM Tivoli Business Systems Manager. It also enables a lower severity event to be sent when the service is, for example, degraded but still available and working. Creating a clearing rule for two red objects The third rule is to clear out the PBT-generated alerts when the situation of three or four red objects no longer occurs. Clearance can happen either when the events are owned or when the events are cleared by a green status event being sent to the objects. See Figure 4-19. Figure 4-19 Rule 3: Clearing PBT-generated alerts144 Service Level Management
  • Although some of the objects may have an outstanding red status, the greenstatus is posted to the top-level business system because enough componentsare available and the business process is no longer impacted.Figure 4-20 shows the completed Propagation properties for the redundancybusiness system. All of the child objects have an equal weight of 100, so they areincluded in the PBT calculations. The three rules described earlier are set andnow the business system is ready to manage this high availability scenario.Figure 4-20 Redundancy business system: PropertiesTesting the scenarioYou send an event to each of the objects. Two objects receive low priority yellowevents. Two objects receive high-priority red events.Chapter 4. Planning to implement service level management using Tivoli products 145
  • The rules dictate that two reds do not cause propagation to the top-level business system. They also prevent propagation of any number of yellow events to the top-level business system. Without the rules, the red and yellow events would propagate to the PBT Demo business system. Figure 4-21 shows that the rules are holding. In this case, the RLP rules and the third PBT rule are in use. Figure 4-21 Two reds, two yellows: No PBT events A third red event is sent to the objects in the business system. This causes PBT rule 2 to fire. This rule is set to trigger when there are three red objects in the business system and to propagate a yellow event up to the high-level business system. Figure 4-22 shows how this happens.146 Service Level Management
  • Figure 4-22 Three reds: PBT rule 2 fired, yellow event sentChapter 4. Planning to implement service level management using Tivoli products 147
  • A fourth red event is sent, so PBT rule 1 is triggered and sends a red event to the PBT Demo business system. This is shown in Figure 4-23. Figure 4-23 Four reds: PBT rule 1 fired, red event sent148 Service Level Management
  • When two of the events are owned, PBT rule 3 is triggered as, in this case, thealerts have been cleared from the objects. This sets them to a green status andso PBT Rule 3 is eligible to fire. Figure 4-24 shows this.Figure 4-24 Events owned: PBT events clearedCompare Figure 4-24 and Figure 4-25 where the alerts are not cleared from theowned events, so the objects stay red and PBT rule 1 is still in effect. Attention: The option to clear alerts from resources when taking ownership can be set globally by the IBM Tivoli Business Systems Manager Administrator using Administrator Preferences. By default, the alert is left posted against the resource. The user can override this in the Take Ownership window. The administrator can change the default to clear all alerts and can remove the override option from the Take Ownership window.Chapter 4. Planning to implement service level management using Tivoli products 149
  • Figure 4-25 Events owned: PBT events not cleared4.3 Tivoli Data Warehouse V1.2 Tivoli Data Warehouse enables IBM Tivoli Business Systems Manager data to pass to IBM Tivoli Service Level Advisor. It is the standard data store for Tivoli products. This section presents an overview about Tivoli Data Warehouse. It also discusses how IBM Tivoli Business Systems Manager data is stored in Tivoli Data Warehouse and how that data is extracted for use by IBM Tivoli Service Level Advisor. Tivoli Data Warehouse is used to store, aggregate, and correlate the data from various monitoring applications. A typical data warehouse environment involves150 Service Level Management
  • source and target databases. Such an environment enables the monitoringapplications to run independently of each other. Data is moved from the sourcedatabase to Tivoli Data Warehouse database using extract, transform and load(ETL) steps.Since the monitoring applications used in this solution provide warehouseenablement packs (WEP), we deploy them for collecting monitoring andmeasurement data into the Tivoli Data Warehouse environment. Eachapplication has a unique code identifying the application data in Tivoli DataWarehouse. The main task is to schedule the execution of these WEPs.The data must be stored, aggregated, correlated from the source applicationdatabases into the data warehouse datamart databases. Therefore, it is essentialfor these WEPs to complete its run before the next cycle. The size of thedatabases in Tivoli Data Warehouse depends on the size of the IT enterprise.IBM Tivoli Service Level Advisor mines data from Tivoli Data Warehouse.Therefore, you must schedule the WEPs. This enables IBM Tivoli Service LevelAdvisor ETL runs after the completion of all the ETLs for the monitoringapplications to provide data to IBM Tivoli Service Level Advisor, including IBMTivoli Business Systems Manager. If an organization has monitoring applications,you must install WEPs of these applications on the control center of the TivoliData Warehouse. Refer to the documentation provided to install these WEPs.The planning gives an estimated time to run each of these WEPs. Table 4-1provides timing estimates.Table 4-1 Monitoring applications with estimated runtime Monitoring application Estimated daily run time IBM Tivoli Monitoring for Web Infrastructure V5.1.2: WebSphere 15 minutes IBM Tivoli Monitoring for Web Infrastructure V5.1.2: Apache Server 15 minutes IBM Tivoli Monitoring for Databases V5.1.0: DB2 35 minutes IBM Tivoli Monitoring for Transaction Performance V5.3 20 minutes IBM Tivoli Monitoring for Web Infrastructure V5.1.2: OS Pack 40 minutes Peregrine Service Center 10 minutesSchedule the WEP of each application according to the estimated times. Set theWEP to run in test mode to confirm the estimated times. When you know thetimes, schedule the WEP accordingly and then move its steps into productionmode. Similarly, plan and test the runtime for the WEP of IBM Tivoli BusinessSystems Manager.Chapter 4. Planning to implement service level management using Tivoli products 151
  • Frequency of ETL runs The frequency of ETL runs depend on the frequency of data collection by source monitoring applications. If a source application collects data at the end of each day, then the WEPs, including the IBM Tivoli Service Level Advisor WEP, can be scheduled to run every day. We recommend that you schedule the ETL to cover the least granular of the source applications. For example, if IBM Tivoli Monitoring for Transaction Performance is scheduled to collect data into its database at 4 a.m. every day, and IBM Tivoli Monitoring for Operating Systems is scheduled to collect data into its database every four hours starting at 00:00 hours, then the first ETL can be scheduled to run every four hours starting at 00:30 hours or every day at 4:30 hours. Other ETLs are scheduled to run subsequently. Scheduling the ETL this way ensures that all the data is extracted, transformed, and loaded into the central data warehouse (CDW) database with minimum performance issues. Using the IBM Tivoli Service Level Advisor ETL to extract Tivoli product data from Tivoli Data Warehouse As we explain in Chapter 3, “IBM Tivoli products that assist in service level management” on page 53, IBM Tivoli Service Level Advisor uses a set of ETL steps to extract data from CDW database into SLM databases. The ETL steps in IBM Tivoli Service Level Advisor are grouped into four processes. Figure 4-26 displays the four ETL processes for IBM Tivoli Service Level Advisor with msrc_cd value DYK. The details for each process are: DYK_m00_Initiate_Process: This process is not to be scheduled. It is supposed to be run only once after migrating from previous versions of IBM Tivoli Service Level Advisor. DYK_m05_Populate_Registration_Datamart_Process: This process extracts the resource definition data-type components, measurement types, attributes, etc. from the CDW to the SLM database. DYK_m10_Populate_Measurement_Datamart_process: This process extracts the measurement data of the resources from CDW to the SLM database. DYK_m15_Purge_Measurement_Datamart_process: This process prunes the aging measurement data periodically.152 Service Level Management
  • Figure 4-26 ETL processes for IBM Tivoli Service Level Advisor WEP The DYK_m05_Populate_Registration_Datamart_Process is referred as Registration ETL. The Registration ETL extracts the measurement type, component type data, and corresponding rules from the CDW to the SLM database. This also extracts the components, its attributes, and other relation into the SLM database. This data helps in defining the service levels objectives and SLAs. By default, the Registration ETL does not extract any data of the available data types from CDW until they are enabled. Before you run this step, you must enable specific source applications in IBM Tivoli Service Level Advisor. To determine the available types of data in the CDW, connect to the central warehouse database (twh_cdw) database from a DB2 command window and may execute a select command as follows: db2 connect to twh_cdw user <db2_Inst_Owner_ID> using <db2_Inst_Owner_PW> db2 select * from twg.msrc Chapter 4. Planning to implement service level management using Tivoli products 153
  • This command has output similar to what is shown in Example 4-1. Example 4-1 Contents of the twg.msrc table MSRC_CD MSRC_PARENT_CD MSRC_NM ------- -------------- ----------------------------------------------------- AMX Tivoli IBM Tivoli Monitoring AMY AMX IBM Tivoli Monitoring for Operating Systems BWM Tivoli IBM Tivoli Monitoring For Transaction Performance 5.2 CTD AMX IBM Tivoli Monitoring for Databases: DB2 DYK - IBM Tivoli Service Level Advisor 2.1 Data Consumer EVENTS - Events GWA AMX IBM Tivoli Monitoring for Web Infrastructure, Version 5.1.0: Apache HTTP Server IZY AMX IBM Tivoli Monitoring for Web Infrastructure, Version 5.1.0: WebSphere Application Server MODEL1 - Tivoli Common Data Model V1 SDESK1 - Service Desk SHARED - Shared SNMP - Simple Network Management Protocol Tivoli - Tivoli Application For example, if SLAs must be defined using data from IBM Tivoli Monitoring for Operating Systems, then a value in the MSRC_CD column for that source application must be enabled in IBM Tivoli Service Level Advisor. To do this, from the IBM Tivoli Service Level Advisor server machine, follow these steps: 1. Launch a command window and change the directory to the location of the IBM Tivoli Service Level Advisor installation (C:TSLA for example). 2. Run the following command for your system: – For Windows slmenv.bat – For UNIX . ./slmenv.sh 3. Run the command: scmd etl getApps This lists the applications that were added as shown in Example 4-2.154 Service Level Management
  • Example 4-2 List of source applications added by defaultMeasurement Source Code: BWMApplication Name: Tivoli Web Services ManagerFlag: NMeasurement Source Code: APFApplication Name: Tivoli Application Performance ManagementFlag: NMeasurement Source Code: DMNApplication Name: Distributed Monitoring Classic EditionFlag: NMeasurement Source Code: GTMApplication Name: Tivoli Business System ManagerFlag: NMeasurement Source Code: ECOApplication Name: Tivoli Enterprise ConsoleFlag: NMeasurement Source Code: MODEL1Application Name: Tivoli Common Data Model v1Flag: NMeasurement Source Code: AMWApplication Name: IBM Tivoli MonitoringFlag: N4. If the required source application is not listed, then enable the data sources using the codes as listed in Example 4-1. Add and enable the codes that apply. scmd etl addApplicationData <msrc_cd> <msrc_nm> scmd etl enable <msrc_cd> Here msrc_cd and msrc_nm are listed in Example 4-1. An example of this is: scmd etl addApplicationData AMY “IBM Tivoli Monitoring for Operating Systems” scmd etl enable AMYThe process here is the same for all the other source applications for which theSLAs are to be created. Some applications may use the Tivoli Common DataModel whose msrc_cd is MODEL1. This is documented in each individual WEPdocument. Check forTWG.MsmtTyp table. If it says MODEL1 in the msrc_cdcolumn, then enable MODEL1.The DYK_m10_Populate_Measurement_Datamart_process is also referred asProcess ETL. This process extracts the measurement data that is related to thecomponents and measurement types that were extracted in the previous ETLprocess. This data is then evaluated for the existing SLAs.Assuming that the runtime of the IBM Tivoli Business Systems Manager WEP is15 minutes, schedule the IBM Tivoli Service Level Advisor WEP for two hoursChapter 4. Planning to implement service level management using Tivoli products 155
  • and 30 minutes after the first WEP is scheduled. This ensures that IBM Tivoli Service Level Advisor obtains all the information from Tivoli Data Warehouse database. This avoids the SLA not being evaluated because the evaluation of the data is tied with the completion of the IBM Tivoli Service Level Advisor WEP.4.4 IBM Tivoli Service Level Advisor V2.1 In complex IT environments, business applications depend on the availability and performance of IT resources. It is important to define the various SLOs of these business applications. IBM Tivoli Service Level Advisor provides the ability to define the SLOs of the business applications. SLOs typically contain various metrics such as availability of an application and server and response time of a transaction. These metrics are all measured over a predetermined period of time as agreed in the SLA between the provider and receiver of the service. IBM Tivoli Service Level Advisor analyzes the data provided to Tivoli Data Warehouse by various monitoring applications for the resources hosting the various business applications. IBM Tivoli Service Level Advisor uses the data to calculate the status of the service levels. Then if necessary, IBM Tivoli Service Level Advisor escalates the service level status of the business applications in case of a violation or trending toward violation. SLOs of the various resources can be mapped to a business application or system. The service provider can show the service levels of any application.4.4.1 Building SLAs in IBM Tivoli Service Level Advisor This section explains how to create SLAs in IBM Tivoli Service Level Advisor. The resource base to access all the information needed to build an SLA is the business services defined in IBM Tivoli Business Systems Manager. The tasks to create an IBM Tivoli Business Systems Manager-based SLA are: 1. Ensure that data from all monitoring applications, including IBM Tivoli Business Systems Manager data, is in the IBM Tivoli Service Level Advisor database. 2. Define schedules for IBM Tivoli Service Level Advisor. 3. Create and publish a service offering. 4. Create an SLA and assign the offering to it.156 Service Level Management
  • Defining the schedulesThe day is divided into various periods to meet the criticality of the business. Forexample, the banking hours are from 9 a.m. to 5 p.m., Monday through Friday.We define periods to define higher SLOs during this period.In another example, for online banking, it is critical to be operational every day.However, the response times for the transactions can vary depending on the timeof the day. You can define periods to reflect this scenario as illustrated inFigure 4-27 for the banking business schedule.Figure 4-27 Banking business scheduleIn IBM Tivoli Service Level Advisor, you can define two of types schedules:auxiliary and business schedules. The periods defined in auxiliary schedulestake precedence over the periods defined in a business schedule.Auxiliary schedules are used to define the schedule periods that are common toall the business units in the organization. For example, you can include theholidays of the organization where the service levels of the objectives don’tmatter. Similarly, to define a maintenances period, auxiliary schedules are usedas well. You can include one or more auxiliary schedules in a business schedule,but auxiliary schedules cannot contain an auxiliary or a business schedule.Enabling hourly evaluation in IBM Tivoli Service Level AdvisorIBM Tivoli Service Level Advisor supports the evaluation of the SLOs to be runevery hour, two hours, three hours, four hours, six hours, eight hours, daily,weekly, and monthly. By default only daily, weekly, and monthly intervals aresupported. For hourly evaluations supported, run the following command fromIBM Tivoli Service Level Advisor environment-enabled command window:scmd mem showHourlyFrequencyIntervals enableCreating SLOs with an hourly frequency depends on the source monitoringapplication data collected and extracted into the CDW database within thatfrequency. If you do not consider these items, you may receive unwanted results.Chapter 4. Planning to implement service level management using Tivoli products 157
  • Building an offering We need a lot of information to build an offering. We concentrate on two items since they are less obvious than the other information. For a full, practical walk-through of defining an offering, see Chapter 5, “Case study scenario: IRBTrade Company” on page 197, and Chapter 6, “Case study scenario: Greebas Bank” on page 315. The two items are: How to select the right resource type How to select the evaluation and intermediate evaluation frequencies Selecting the resource type Through the business system view in IBM Tivoli Business Systems Manager, you see which components support a given service. For example, in Figure 4-28, you see the resources that support the Online Accounts service. Figure 4-28 TBSM business view showing resources that support services158 Service Level Management
  • We monitor a metric of either one business system or a component inside it. Withthis information, use the following steps to define the resource type to select inthe IBM Tivoli Service Level Advisor offering.1. Knowing the metric and the type of the component, know which application is used to monitor it. If this application is not installed yet, install it and its WEP. Then enable it inside. Refer to Getting Started with IBM Tivoli Service Level Advisor, SC32-0834-03.2. Look in the application’s Warehouse Enablement Pack Implementation Guide, which you can find in the application CD that contains the WEPs. Go to the directory that contains the WEPs (should contain the acronyms wep, tdw, tedw, etl, etc.). Then go down through the directories until you find the doc directory. This one contains the document.3. In the document, look for the following tables: – Measurement type (table MsmtTyp) – Component measurement rule (table MsmtRul)4. Look either for a metric in the MsmtTyp table or a component type in the MsmtRul table. If you start from the MsmtTyp table, you should see the MsmtTyp_ID (first column) of the metric you selected and the corresponding Comp_Typ_CD in the MsmtRul table. Sometimes more than one CompTyp_CD may correspond to a given metric. Choose the one that you want to monitor. At the end of this step, you should have a component type (CompTyp_CD column in the MsmtRul table).5. Find the Component Type (table CompTyp) table. With the CompTyp_CD information, find the corresponding CompTyp_Nm. This is the resource type that you should type into the IBM Tivoli Service Level Advisor offering. In the case that you have more than one component type in the previous step, this table can help you decide which one to choose, because it gives you more information about each of the component types.For example, with IBM Tivoli Business Systems Manager, if you go to theMsmtRul table in the enablement guide, you see only one component type,BUSINESS_SYSTEM. This translates to Business System in the CompTyp table.This is the resource to choose when selecting the resource during the offeringcreation. IBM Tivoli Monitoring for Transaction Performance is another simplecase. In the enablement guide, the MsmtRul table has only one component type,BWM_TX_NODE, that translates to Transaction Node in the CompTyp table.As another example, suppose that you want to use, as part of an SLA, the CPUutilization of one of our servers. IBM Tivoli Monitoring can collect this metric,specifically using IBM Tivoli Monitoring for Operating Systems. In the enablementguide, look at the MsmtTyp, search for the word CPU somewhere in the metric,and select the Percent of time that the CPU is idle for example. This correspondsChapter 4. Planning to implement service level management using Tivoli products 159
  • to MsmtTyp_ID 47. In the MsmtRul table, 47 corresponds to AMY_CPU. In CompTyp table, AMY_CPU is a system processor. Use this as a resource inside the offering. In a third example, you want the number of HTTP sessions as the metric. You can collect this metric by the IBM Tivoli Monitoring for WEB Infrastructure. In the enablement guide, in the MsmtTyp table, choose the Number of concurrently live servlet sessions (load) metric. This is MsmtTyp_ID 15. In the MsmtRul table, 15 corresponds to IZY_SERVLET_SESS. In CompTyp table, IZY_SERVLET_SESS is the IBM WebSphere servlet session. During the creation of the offering in IBM Tivoli Service Level Advisor, in the Select Resource Type pane (Figure 4-29), select one entry in the tree on the left. Then the resource types are displayed in the table on the right. The resource type that you want for the offering may already appear in the table in the left panel. This happens, for example, in the case where the resource type is of business systems and transaction node. Figure 4-29 Select Resource Type table160 Service Level Management
  • For System Processor, notice that it does not appear in the table. To enable it, select Host Monitored by IBM Tivoli Monitoring. This shows a table with three pages. If you advance to the last page, you see the System Processor resource type as shown in Figure 4-30. After you select a resource type, click Next and then click Add. Then you reach the Select Metrics page. From here, you follow the steps that are presented in Part 2, “Case study scenarios” on page 195.Figure 4-30 System processor resource type Chapter 4. Planning to implement service level management using Tivoli products 161
  • Selecting the evaluation frequency The evaluation frequency depends on the reporting period that was defined in the signed SLA. It is usually monthly, but can be weekly or even a daily. If intermediate evaluations are used, the minimum evaluation frequency that can be used depends on the variables discussed in “Defining the schedules” on page 157. Intermediate evaluations, by default, have only daily frequency. They can also be of hourly frequency, but hourly frequency should be enabled. Assume that the minimum evaluation frequency is every four hours and the evaluation frequency is monthly. In this case, the intermediate evaluation frequency is daily. Building SLAs This section explains how to select a service, how to select a resource, and how to select the SLA Start Date when creating the SLA in IBM Tivoli Service Level Advisor. For a full walk-through of the SLA definition, refer to Part 2, “Case study scenarios” on page 195. Selecting the service On the Select Service page, associate the SLA to the business service that describes the service the SLA is monitoring. In this case, the name of the service is the same as the business system in IBM Tivoli Business Systems Manager. Define the business system in IBM Tivoli Business Systems Manager as a service to allow the association of an SLA to it. Refer to “Marking an IBM Tivoli Business Systems Manager business system as a service” on page 187 to do this. Then, run the IBM Tivoli Business Systems Manager WEP. Also run both IBM Tivoli Service Level Advisor Registration ETLs (Populate Registration and Populate Measurement) to make the information about the newly-created service available on the Select Service page. For example, assume that you are creating an SLA for the Online Accounts business system shown in Figure 4-28 on page 158. On the Select Services page, you select the Online Accounts service as shown in Figure 4-31.162 Service Level Management
  • Figure 4-31 Online Accounts service Selecting the resource There are two ways to define resources in IBM Tivoli Service Level Advisor: dynamic and static. In the case of a dynamic list of resources, we define a set of filters and any resources that match the filters are used to calculate that specific SLO. If a new resource is added that matches the filters, this new resource is also included in the SLO calculation. Static resources are selected using filtering criteria. There are no automatic additions to the resources that are selected, even if the new resource matches the filter. Chapter 4. Planning to implement service level management using Tivoli products 163
  • Tip: When defining dynamic resources, select the Preview current evaluation filters option in the Filter Resources window to see the resources that currently match the filters. SLA Start Date You are required to specify the SLA Start Date when creating the SLA. The SLA Start Date can be useful in the following cases: If the SLA that is being created is to be started in the future For example, if the SLA must start on a future date, set the start date accordingly. Then the evaluation of this SLA only starts from the date that was set in the future. Evaluate using historical data Set the SLA start date to start in the past. This can help to validate the SLOs set for the resource using the existing infrastructure. For example, if you set the SLA start date in the past, then using the existing monitoring data, the SLA evaluates up until the most recent ETL run. This gives you an idea about the SLA results. This may help you to determine if the SLO of that resource can be met using the existing resource. This option is viable only if the information is available in the Tivoli Data Warehouse. Different time zones During the creation of the SLA, you can set the time zone of this SLA along with the start date. This sets the start time of the SLA in a different time zone, if required.4.4.2 Supporting SLM with IBM Tivoli Service Level Advisor This section explains how to take advantage of some of the IBM Tivoli Service Level Advisor features to help support our SLM strategy. The examples in this section assume that the SLAs defined in Chapter 6, “Case study scenario: Greebas Bank” on page 315, are already created. That chapter also contains samples of reports that are used to measure SLM. Reports In IBM Tivoli Service Level Advisor, the reports are on demand. This means that you, at any time, can obtain any report of what is currently happening with the SLAs. Depending on the type of user that is accessing the reports and its attributes, all the SLAs or a subset of them are available for viewing. The type of reports that are available depend on the variables listed in the following sections.164 Service Level Management
  • Types of usersThere are three types of report users: operator, executive, and customer. This isparticularly important when creating the various report users.Figure 4-32 shows the relationship among the various IBM Tivoli Service LevelAdvisor report users. Provider of services can be the internal IT department oran application service provider. Recipient of services can be the various lines ofbusiness inside an enterprise or the users of application services from theapplications service provider. In either case, there is an SLA between theprovider and the recipient of services. The report of this and other SLAs is theobjective of each user according to each one’s perspective. Provider of Services Recipient of Services Executive Customer SLA OperatorFigure 4-32 Report users relationshipThe operator and the executive belong to the provider organization. They areresponsible to provide services to the customer. AN SLA exists between theexecutive and the customer. The executive is responsible for the service, but theoperator is the one who takes care of the day-to-day operations to guarantee theservice level.Therefore, the operator needs maximum details to diagnose any problems. Theexecutive needs a high level idea of all the services provided, and the customerneeds only the information about his or her own SLAs.The following two objects in IBM Tivoli Service Level Advisor are important whendealing with reports: Customers are the recipients of service. In an operational level agreement (OLA), customers can help to distinguish the various internal providers of a service or in a underpinning contract to designate the external provider of service.Chapter 4. Planning to implement service level management using Tivoli products 165
  • Realms are sets of customers. Realms can be used to group customers functionally, geographically, etc. For an example, refer to Chapter 6, “Case study scenario: Greebas Bank” on page 315. When creating report users, one way to restrict what the user can see is to limit the information of the SLAs only to the ones that belong to a specific customer or realm. This is particularly useful when the user type is customer. The reason is because you don’t want customers to have access to other customer’s data. You may also want to assign operators for certain set of customers or realms. When creating customers and realms, take all of this in consideration. The user can have three different types of views as summarized in Table 4-2. The external view cannot see internal-only metrics. It is a good view for a customer user type, because customers should not see OLA metrics used to support the SLA. Also this view allows restriction by customers or realms. Customers should not have information about other customers. The unrestricted view is for operators and managers who are responsible for all the services provided by the IT department or by the service provider. The restricted view can be used when IT operators or managers are responsible for part of the services or the infrastructure and you want to restrict the information to which each one can have access. Table 4-2 Available views Views Can view all? Can view internal Value in the only metrics? addUser CLI Unrestricted Yes Yes 1 Restricted No, restricted to customer/realm Yes 2 External No, restricted to customer/realm No 3 You create the users using the IBM Tivoli Service Level Advisor command line interface (CLI) as shown in this example: scmd report addUser -name BankingExecutive -view 3 -customer Banking -userType 3 This command creates a report user called BankingExecutive with an external view. This user is a customer type of user and is restricted to viewing reports of the customer Banking. Refer to Command Reference for IBM Tivoli Service Level Advisor, SC32-0833-03, for details about this CLI. Types of reports Many types of reports are available to IBM Tivoli Service Level Advisor report users. Table 4-3 lists the reports that are available to each user type. These reports include all the SLAs to which a particular user can have access.166 Service Level Management
  • Table 4-3 Available reports by user type Operator Executive Customer Dashboard Customers by Realms Yes Default No SLA by Customers Default Yes Default Ranking SLA Yes Yes Yes SLA Type Yes Yes No Customer Yes Yes No Realm Yes Yes No Offering Component Yes No No Resource Yes No No Details Overall details Yes Yes Yes SLA Results Yes No No Trends Yes No No Violations Yes No NoThe dashboard reports are, by default, the first page that a user see whenlogging in. They give an overall idea of the status of all the SLAs a user hasaccess to or to all the customers (depending if you are a executive user) forwhom the user is responsible. See Figure 4-37 on page 174 for an example.The user can modify the time range or the SLA types listed, using the FilterCriteria section in the report. In this view, the user can easily see where problemsor potential problems are and explore details to find the causes. The user doesthis by clicking in the cell that shows the violations or trends (red or yellow cell).Then they see the SLA Details view. For more information about the contents ofthis type of report, see IBM Tivoli Service Level Advisor SLM Reports,SC32-1248.Ranking reports (Figure 4-33) consider the number of violations, trends, andSLAs, and display them in order. This is used to quickly find the most impactedobjects (SLA, SLA type, resource, customer, realm, or offering component) inorder. It uses an algorithm to define the rank. For details about the algorithm, seeIBM Tivoli Service Level Advisor SLM Reports, SC32-1248.Chapter 4. Planning to implement service level management using Tivoli products 167
  • Figure 4-33 Ranking report Details reports show more details about a set of SLAs, such as SLO results, trends, and violations. Summary graphs In some of the reports, summary graphs are displayed. Two sets of graphs can be displayed depending on the type of report that is shown. For SLA details or Overall details reports, a pair of graphs is displayed at the top of the page. You can customize the type of graph and choose from the following variables: Metrics or resources Trends or violations Bar or pie chart The graph can be displayed for the metrics or the resources with most trends or violations.168 Service Level Management
  • For the ranking reports, eight different graphs can be displayed per object type (SLA, SLA Type, customer, realm, offering component and resource): Violations per object Trends per object Violations per time period Trends per time period Violations and trends per object Rank per object Top objects with the most violations Top objects with the most trends Figure 4-34 shows two examples of summary graphs. One example of using a ranking report is for the executive who wants to know about the resources that most contributed to violations in the last month.Figure 4-34 Summary graph Changes to service level agreements According to ITIL, SLM is a dynamic process with constant reviews and improvements. In addition, the infrastructure is something dynamic that can change and evolve with time. The following sections show two change situations: changing SLOs and replacing resources. They also show how IBM Tivoli Service Level Advisor can handle them. Chapter 4. Planning to implement service level management using Tivoli products 169
  • Changing service level objectives The first situation is when the SLOs are changed. This can happen in a regular SLA review. To set up the new service levels, create a new offering based on the original one (using the IBM Tivoli Service Level Advisor Create Like feature) and replace the offering in the SLA. Refer to Chapter 6, “Case study scenario: Greebas Bank” on page 315. Replacing resources The second situation is when a resource is replaced. For example, Server1 breaks and is replaced by Server2. In this case, it would be nice if the monitoring application that is monitoring Server1 starts monitoring Server2 as well. Then you should run the ETLs for both the monitoring application and for IBM Tivoli Service Level Advisor. With this, you can see a reference to Server2 during the Replace Resource in IBM Tivoli Service Level Advisor. For example, consider that you want to replace S2STI-TBSMWebCons_67 with the Step_1... resource as shown in Figure 4-45 on page 185. Follow these steps: 1. Log in to the IBM Tivoli Service Level Advisor administrator’s console. 2. Click Administer SLA →Replace Resource. 3. In the Find Resource window, click Browse. 4. In the Select Resource Type window, select Transaction Node and click Next. 5. In the Create Filter window, complete these tasks: a. Click Create Filter. b. In the Attribute field, select Transaction Management Policy. c. In the Value field, type S2STI-TBSMWebCons. d. Click Next. 6. In the Select Resources window, select S2STI-TBSMWebCons_67 and click Next. 7. In the Find Resource window, click Next. 8. In the Replace Resource window, repeat steps 3 to 7, but now choose the Step_1... resource. Select Online Accounts Trend SLA and click Finish. 9. In the Track Updated SLAs window, verify that the SLA is there and click Close. This way the resources are replaced in the Online Accounts Trend SLA. Adjudication IBM Tivoli Service Level Advisor provides a way to adjudicate violations. In the SLA, you can specify situations where a violation can be adjudicated. For170 Service Level Management
  • example, one situation can be that the service level is guaranteed only up to acertain number of users connected to an application running in WebSphere. Youcan use IBM Tivoli Monitoring for WEB Infrastructure live servlet sessions metricto monitor the number of sessions in a given server. When the number ofsessions exceed a certain breach value, you receive a violation. This metric canbe created in IBM Tivoli Service Level Advisor as an internal one, so that thecustomer does not receive the violation event. But with this, you can have a welldocumented way to justify the adjudication.To adjudicate any violation, follow these steps:1. Log in to the IBM Tivoli Service Level Advisor administrator console.2. Click Administer SLAs →Manage Violations.3. In the Manage Violations window, select the violation that is to be excluded and click Exclude.4. In the Exclude Violation window, write the reason for excluding the violation and click OK.Tiered SLAsIBM Tivoli Service Level Advisor has the capability to combine one or more SLAsinto another one. Here you use this to create an SLA that includes all threebanking SLAs. If any of these SLAs has a violation, the Banking SLA shows aviolation. You also link this to the Banking business service, so that the Bankingbusiness system icon in the IBM Tivoli Business Systems Manager executiveconsole shows any violations in any of the Banking services.1. In the IBM Tivoli Service Level Advisor administrator console, click Administer Offerings →Create Offering.2. In the Name Offering window, complete these tasks: a. For Name, type Banking Offering. b. For Description, type This offering includes all the SLAs in the Banking business unit. c. Click Next.3. In the Select SLA Type window, click Next.4. In the Include SLAs window, click Add.5. In the Select SLAs window, select all three SLAs: – Online Accounts – Interbank Transfers – Account Application Then click OK.Chapter 4. Planning to implement service level management using Tivoli products 171
  • 6. In the Include SLAs window (Figure 4-35), click Next.Figure 4-35 Include SLAs window 7. In the Select Business Schedule window, select 24 x 7 schedule and click Next. 8. In the next panels, click Next until you see the Summary window. 9. In the Summary window, select Publish the offering and click Finish. Don’t include any offering components. To create the SLA, follow these steps: 1. Click Administer SLA →Create SLA. 2. In the Name SLA window, in the SLA Name field, add Banking SLA and click Next.172 Service Level Management
  • 3. In the Select Customer window, select Banking and click Next. 4. In the Select Service window, select Banking and click Next. 5. In the Select Offering window, select Banking Offering and click Next. 6. In the Select SLA Start Date window, click Next. 7. In the Summary window, click Finish. Now look at the reports for this SLA. Log in to the IBM Tivoli Service Level Advisor Reports interface as the SLA Administrator. Then click in one of the cells of the Banking SLA. Now you see the Banking SLA with the three other SLAs that it contains as shown in Figure 4-36.Figure 4-36 SLA details Chapter 4. Planning to implement service level management using Tivoli products 173
  • If you go back to the high level report, you will see that each violation on two of the SLAs are reflected on the Banking SLA (that is the parent). You also see that two of the component SLAs have one violation and that the Banking SLAs have two. Each of the component SLA’s violations is reflected in the parent or tiered SLA as shown in Figure 4-37.Figure 4-37 Reports dashboard Details of what is seen for SLA violations are given in the case study scenarios presented in Part 2, “Case study scenarios” on page 195. If a violation or trend is propagated to this SLA from one of the associated ones, this event is sent to IBM Tivoli Business Systems Manager to be shown in the executive dashboard and is associated with the Banking business system.174 Service Level Management
  • Maintenance scheduleIt is important to schedule preventive maintenance from time to time. Be sure toinclude a maintenance window in the signed SLA.The maintenance, in this case, should happen every three months on a Sunday.The maintenance should be done from 0:00 a.m. to 2:00 a.m. on Sunday. Todefine this to IBM Tivoli Service Level Advisor, the only prerequisite is that themaintenance window is in the future.The process to assign a maintenance window is to create a new schedule with aNo Service period defined to cover the maintenance window and replace theexisting schedule with it. Assume that today is 12 October 2004 and you want themaintenance to happen on 12 December 2004 from 0:00 a.m. to 2:00 a.m. Alsoassume that you want to do this maintenance in the resources that support theOnline Banking service.Changing the scheduleThe 24 x 7 schedule cannot be changed because it is used in some offerings.Therefore, create another schedule based on the one first. Follow these steps:1. In the Administrator Console, click Administer Offerings →Manage Schedules.2. In the Manage Schedules window, select 24 x 7 schedule and click Create Like.3. In the Name Schedule window, complete these tasks: a. For Name, select the 24 x 7 20041219M schedule. b. For Schedule Description, the schedule is the same as the 24 x 7 schedule, except that it has a maintenance (no service) window on 19 December 2004 from 0:00 a.m. to 2:00 p.m. c. Click Next.4. In the Select Schedule Type window, click Next.5. In the Include Auxiliary Schedules window, click Next.6. In the Define Periods window, the original Critical period is already there. Add a No Service period. Click Create.Chapter 4. Planning to implement service level management using Tivoli products 175
  • 7. In the Create Period window (Figure 4-38), complete these tasks: a. In the Frequency field, select Single Date. b. The window changes for the options relative to Single Date. i. In the State field, select No Service. ii. Keep the Time Zone and Start Time as the default. iii. In the End Time field, select 01:59. iv. In the Date field, type 12/19/2004 or use the calendar icon on the right side of the field. v. Click OK.Figure 4-38 Maintenance period176 Service Level Management
  • 8. You return to the Define Periods window (Figure 4-39). The difference is that you added the No Service period. Click Next.Figure 4-39 Modified schedule 9. In the Summary window, click Finish. Chapter 4. Planning to implement service level management using Tivoli products 177
  • In the Manage Schedules window (Figure 4-40), you see the added schedule.Figure 4-40 Schedules Replacing the schedule Now replace this schedule in the Online Banking Offering. Tip: As a general rule, create only one SLA for each offering. There are situations, for example, where the same type of service is provided to many different customers, using different resources. They have the same metrics, breach values, and schedules. In this case, using the same offering as a base for many SLAs can be lead to confusion and unnecessary complexity. 1. Click Administer Offerings →Manage Offerings. 2. In the Manage Offerings window, select Online Accounts Trend Offering and click Change.178 Service Level Management
  • 3. In the Associate SLAs window (Figure 4-41), in the task list, click Select Compatible Business Schedule.Figure 4-41 Offering tasks Chapter 4. Planning to implement service level management using Tivoli products 179
  • 4. In the Compatible Business Schedule window (Figure 4-42), select 24 x 7 20041219M schedule and click Next.Figure 4-42 Select compatible business schedule 5. Continue clicking Next until you reach the Summary window. 6. In the Summary window (Figure 4-43), at the bottom, there is a table with all the SLAs that are affected by this change. Click Finish. Figure 4-43 Affected SLAs180 Service Level Management
  • 7. In the Track Updated SLAs window, you see a table similar to the one in Figure 4-43 for tracking the SLAs that are affected by the change on this offering. Click Close.Now the maintenance window is included. At the end of the month (monthly SLAperiod), the SLA will be calculated taking into account this maintenance period.Adding a maintenance schedule period using CLIYou can perform the same operation using a CLI. You must follow this set of ruleswhen running this operation from the CLI: The schedule period should be present in the Business/Auxiliary schedule to which this period going to be added and a breach value should be defined. A No Service period can be added even if it is not present in the existing Business/Auxiliary schedule. The schedule period can be added only for a single date in future. The schedule period can be on the same day if the time is in future. The schedule period cannot be added for a past time or date. The schedule period cannot span two dates even though the period is less than 24 hours. If the span must be two days, then two schedule periods should be added.The CLI usage is as follows:scmd mem addSingleSchedulePeriod -schedule <schedule name> -date <YYYY MM DD>-startHour <HH> -endHour <HH> -state <1-Critical | 2-Peak | 3-Prime |4-Standard | 5-Low Impact | 6-Off Hours | 7-No Service>Here is an example of the command:scmd mem addSingleSchedulePeriod -schedule “IRB Trade Business Schedule” -date2004 11 21 -startHour 05 -endHour 12 -state 7This adds a No Service state on 12 November 2004 between 05:00 hours and12:00 hours. This CLI is helpful if you must suddenly set up a maintenanceperiod by adding a No Service period.TrendsAnother SLM tool in IBM Tivoli Service Level Advisor is the use of trends. Trendsare automatically calculated in all the metrics selected for an SLA. To improvethis capability, you can add another metric. This section explains how to addanother metric, for example, and how to set the metric for trending analysis. Themetric is to collect the performance on the same resource in IBM TivoliMonitoring for Transaction Performance that is feeding a IBM Tivoli BusinessSystems Manager business system.Chapter 4. Planning to implement service level management using Tivoli products 181
  • We already created the original SLA, Online Banking SLA. Now we modify this SLA to include this new metric and enhance the trend. For this, we include the same resource that is feeding events to the resources under Real-time Online Account Transactions. The first stage is to modify the offering. Because IBM Tivoli Service Level Advisor does not allow you to add new service offering components, create another offering using the original one as a base. The reason IBM Tivoli Service Level Advisor behaves this way is because the published offering can be assigned to some other SLAs other than the one you want to modify. This can cause changes on those SLAs when this was not the intention. Creating the online accounts trend offering To modify the offering, follow these steps: 1. On the IBM Tivoli Service Level Advisor Administrator Console, click Administer Offerings →Manage Offerings. 2. In the Manage Offerings window, select Online Accounts Offering 20041001 and click Create Like. This creates a copy of the Online Banking Offering. 3. In the Name Offering window, complete the following tasks: a. In Offering Name field, add Online Accounts Trend Offering. b. In Offering Description field, add This offering will add the performance metric to improve trend capability. c. Click Next. 4. In the Select SLA Type window, select External and then click Next. 5. In the Include SLAs window, click Next. 6. In the Select Business Schedule window, click Next. 7. In the Include Offering Components window, click Add. 8. In the Select Resource Type window (Figure 4-44), you see the resource that is under Real-time Online Account Transaction Business System. If you examine the details of this resource, you see that events are being sent from IBM Tivoli Monitoring for Transaction Performance to this resource. You also see that the name of the management policy is S2STI-TBSMWebCons. Because Transaction Node is the resource type used by IBM Tivoli Monitoring for Transaction Performance, select Transaction Node and click Next.182 Service Level Management
  • Figure 4-44 Real-time online account transactions resource9. In the Include Metrics window, click Add.10.In the Select Metrics window, select Response Time and click Next.11.In the Define Breach Values window, complete these tasks: a. As defined in OLA, in the Average files field, type 10. b. For Keep Violation Condition with, select Actual average greater than supplied average. c. Click Next.12.In the Evaluation Frequency window, complete these tasks: a. In Access to Results, select Internal Use Only. We don’t want business executives outside of the business unit to see this. a. In Evaluation Frequency, select Monthly. b. In Advanced Metric Settings, select Configure advanced metric settings. c. Click Next.13.In the Advanced Metric Settings window, complete these tasks: a. In Intermediate Evaluations, select Perform intermediate evaluations. b. Still in Intermediate Evaluations, keep the Daily selection.Chapter 4. Planning to implement service level management using Tivoli products 183
  • c. In Trend Analysis, select Current evaluation Period Only. d. Click Finish. 14.In the Include Metrics window, click Next. 15.In the Name Offering Component window, in Offering Component field, add Online account response time. Click Next. 16.In the Include Offering Components window, click Next. 17.In the Summary window, select Publish the offering and click Finish. Creating the online accounts trend SLA Follow these steps to create the online accounts trend SLA: 1. In the Administrator’s Console, click Administer SLAs →Create SLA. 2. In the Name SLA window, complete these tasks: a. For SLA Name, type Online Accounts Trend SLA. b. For SLA Description, type This SLA contains the extra performance metric. c. Click Next. 3. In the Select Customer window, select Banking and click Next. 4. In the Select Service window, select Real Time Online Account Transactions. Then click Next. 5. In the Select Offering window, select Online Accounts Trend Offering. Then click Next. 6. In the Add Resources to Business System Availability window, follow the same procedure as explained in “Selecting the resource” on page 163 and in Chapter 6, “Case study scenario: Greebas Bank” on page 315. 7. In the Add Resources to Online Account Response Time window, click Add. 8. In the Select Resource List Type window, select Static Resource List. Then click Next. 9. In the Filter Resources window, the name of the management policy is S2STI-TBSMWebCons. To select the resource that corresponds to this policy, follow these steps: a. Click Create Filter. b. A new row is displayed in the Resource Filters table. In this first row, under the Attribute column, click the arrow on the right side of the field and select Transaction Management Policy from the list.184 Service Level Management
  • c. In the Value field, add any part of the name of the transaction management policy, for example, S2STI-TBSM. d. Click Next. 10.In the Select Resources window (Figure 4-45), you see Step_1_..., which is a subtransaction of the other transaction. Select S2STI-TBSMWebCons_67 and click Next.Figure 4-45 Filter Results 11.In the Add Resources to Online Account Response Time window, click Next. 12.In the Select SLA Start Date window, complete these tasks: a. Make this SLA valid for the next month. In the SLA Start Date, specify the first day of the next month. b. Click Recalculate First Evaluation Dates. c. Click Next. 13.In the Summary window, click Finish. Chapter 4. Planning to implement service level management using Tivoli products 185
  • Escalating the SLA events IBM Tivoli Service Level Advisor provides the ability for event escalation. The types of events are violation of SLA, trending toward a violation for SLA, trend cancel for SLA, and application event. IBM Tivoli Service Level Advisor also provides the ability to configure additional messages to be escalated using the following CLI command: scmd log handler eventWatcher The escalation message can be any of the following forms: E-mail message Simple Network Management Protocol (SNMP) event TEC event To enable TEC event escalation with service details when violation or trending toward violation occurs, load the sample ruleset provided with the SLM Event class definitions into the TEC Rule base. See Command Reference for IBM Tivoli Service Level Advisor, SC32-0833-03, for details to customize and enable the event escalation. You can toggle on or off the event escalation for parent SLAs in the tiered SLA using the CLI: scmd escalate parentSLAEscalation {true|false} This disables any violation or trending toward violation event escalation to TEC. Load the sample TEC rule, slmDropParentEvents.rls, that is provided into TEC. After the rule is loaded and event escalation is switched on using the CLI, the parent SLA events can be controlled for escalation.4.4.3 Realistic expectations for real-time SLAs To be as close to real time as possible, you can reduce the evaluation period as much as possible up to one hour. The limit on how low you can go depends on how fast the source, IBM Tivoli Service Level Advisor ETLs, and SLA evaluation can be run. Refer to “Frequency of ETL runs” on page 152 for details.4.4.4 Integrating IBM Tivoli Service Level Advisor with IBM TivoliBusiness Systems Manager Section 4.4, “IBM Tivoli Service Level Advisor V2.1” on page 156, introduces the concept of loading IBM Tivoli Business Systems Manager data into Tivoli Data Warehouse and extracting it to IBM Tivoli Service Level Advisor. This enables IBM Tivoli Service Level Advisor to use IBM Tivoli Business Systems Manager data to calculate SLA metrics. In “Escalating the SLA events” on page 186, you can learn how to send IBM Tivoli Service Level Advisor events to TEC. In186 Service Level Management
  • “Executive dashboard” on page 130, you learn how the IBM Tivoli BusinessSystems Manager executive dashboard can receive IBM Tivoli Service LevelAdvisor events.This section describes the process to pass IBM Tivoli Service Level Advisorevents from TEC into IBM Tivoli Business Systems Manager.Getting IBM Tivoli Service Level Advisor events into IBM TivoliBusiness Systems Manager executive dashboardFor IBM Tivoli Service Level Advisor events to show in the correct icon on theIBM Tivoli Business Systems Manager executive dashboard, you must performthe following actions:1. Place IBM Tivoli Business Systems Manager data into IBM Tivoli Service Level Advisor (TSLA). This is detailed in 4.4, “IBM Tivoli Service Level Advisor V2.1” on page 156.2. Mark the IBM Tivoli Business Systems Manager business system as a service.3. Build an SLA or SLAs around services defined in IBM Tivoli Business Systems Manager. This is detailed in “Building SLAs” on page 162.4. Enable TSLA → TEC → TBSM event traffic and display it in the TEC console.The following sections explain how to mark IBM Tivoli Business SystemsManager business services as a service. They also explain how to enable IBMTivoli Service Level Advisor to send event data, using TEC, to IBM TivoliBusiness Systems Manager for display in executive dashboard views.Marking an IBM Tivoli Business Systems Manager business systemas a serviceThe concept of services is shared between IBM Tivoli Business SystemsManager, Tivoli Data Warehouse, and IBM Tivoli Service Level Advisor.Basically, an entity defined as a service in IBM Tivoli Business Systems Managerwill be a service within Tivoli Data Warehouse. It is also available as a service forselection during the SLA definition process in IBM Tivoli Service Level Advisor.Marking a resource a service within IBM Tivoli Business Systems Manager canbe done for both business systems and individual objects within a businesssystem. Note that objects that are not in business systems cannot be marked asservices.Chapter 4. Planning to implement service level management using Tivoli products 187
  • To mark a resource as a service, click the resources’ Properties tab and select the Executive View tab. This opens the Executive Dashboard panel (Figure 4-46) for defining a resource as a service. Figure 4-46 Executive dashboard window The Executive Dashboard panel contains two check boxes and five text fields to complete (starting from the top of the right pane in Figure 4-46): Executive Dashboard Service check box Selecting this box marks the resource as a service and eligible to appear as a service in the executive dashboard. Selecting this box also defines the resource as a service within Tivoli Data Warehouse and IBM Tivoli Service Level Advisor. Name of Service text field This is pre-filled with the name of the resource. Service Identified text field This is also pre-filled with the name of the resource. This is a unique identifier field. Once you set it, you cannot change it. This is so that the data going to Tivoli Data Warehouse is consistent even if the name of the BSV is changed.188 Service Level Management
  • Business Role of Service text field This field is a free-form text field that is used to describe the service. Values that have already been placed in this field for other resources are available from the drop-down list to the right of the text field. Business Impact for Red Alerts field This field is for defining the impact upon this Service when a red event is received. Business Impact for Yellow Alerts field This field is for defining the impact upon this service when a yellow event is received. SLA Supported check box. This check box enables the secondary indicator in the executive dashboard icon for the service. When you select this option, and the ETLs have run, the IBM Tivoli Business Systems Manager resource is a service resource within IBM Tivoli Service Level Advisor.Enable TSLA →TEC →TBSM event trafficThe IBM Tivoli Business Systems Manager executive dashboard is notified byTEC. TEC receives IBM Tivoli Service Level Advisor events as part of IBM TivoliService Level Advisor setup. To have TEC forwards events to IBM Tivoli BusinessSystems Manager, you must update the TEC rulebase. You do this by running ascript that is provided with the IBM Tivoli Business Systems Manager code that isinstalled on TEC. The script is:%BINDIR%TDSEventServiceconfigtbsmtslatbsmtsla.shRunning this script sets up everything. After this isdone, IBM Tivoli Service Level Advisor events aresent to IBM Tivoli Business Systems Manager. If theevents are for a service that is represented in theexecutive dashboard, the IBM Tivoli Service LevelAdvisor icons show that there are outstandingviolations or trends.You only need to perform this process once for eachTEC feeding into IBM Tivoli Business Systems Figure 4-47 IBM TSLAManager. Figure 4-47 shows an executive dashboard notifications on a businessthat has non-viewed SLA violations (red square) and system iconviewed SLA trends (blue arrow).Chapter 4. Planning to implement service level management using Tivoli products 189
  • 4.5 Additional products supporting SLM This section provides a brief description and information about additional products, mainly IBM Tivoli monitoring applications, that contribute to the SLM solution.4.5.1 IBM Tivoli Monitoring for Transaction Performance Chapter 3, “IBM Tivoli products that assist in service level management” on page 53, introduces IBM Tivoli Monitoring for Transaction Performance. It is used for monitoring user transactions on Web and desktop-based applications. It is useful for SLM because the user-experience events from IBM Tivoli Monitoring for Transaction Performance supplement the resource-specific events from IBM Tiv