Service level management using ibm tivoli service level advisor and tivoli business systems manager sg246464

  • 2,358 views
Uploaded on

 

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,358
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Front coverService Level Management UsingIBM Tivoli Service Level Advisor andTivoli Business Systems ManagerIntegrate Tivoli Business SystemsManager and Tivoli Service Level AdvisorMap business service managementto service level managementAchieve proactive service levelmanagement Edson Manoel Kimberly Cox Eswara Kosaraju Matt Roseblade Alex Shafir Venkat Surath Eduardo Tanaka Brian Watsonibm.com/redbooks
  • 2. International Technical Support OrganizationService Level Management UsingIBM Tivoli Service Level Advisor andTivoli Business Systems ManagerDecember 2004 SG24-6464-00
  • 3. Note: Before using this information and the product it supports, read the information in “Notices” on page ix.First Edition (December 2004)This edition applies to IBM Tivoli Business Systems Manager V3.1, IBM Tivoli Service LevelAdvisor V2.1, IBM Tivoli Enterprise Console V3.9, and IBM Tivoli Monitoring for TransactionPerformance V5.3 products. Note: This book is based on a pre-GA version of a product and may not apply when the product becomes generally available. We recommend that you consult the product documentation or follow-on versions of this redbook for more current information.© Copyright International Business Machines Corporation 2004. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.
  • 4. Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviPart 1. Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction to service level management . . . . . . . . . . . . . . . . . 3 1.1 Service level management overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Service level management benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Service level management components . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.3 People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Business service management approach to service level management. . 17 1.4.1 Convergence of business service management and service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5 Improving service level management through integration . . . . . . . . . . . . . 20 1.6 Scope of this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 2. General approach for implementing service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 A look at the ITIL process improvement model . . . . . . . . . . . . . . . . . . . . . 25 2.2 Planning for service level management implementation . . . . . . . . . . . . . . 26 2.2.1 Identifying roles and responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.2 Understanding the services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.3 Assessing the ability to deliver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 Implementing service level management . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.1 Developing service level objectives . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.2 Negotiating on service level agreements . . . . . . . . . . . . . . . . . . . . . 37 2.3.3 Implementing service level management tools . . . . . . . . . . . . . . . . . 38 2.3.4 Establishing a reporting function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3.5 Adjusting IT processes to include service level management. . . . . . 41 2.4 Ongoing service level management program . . . . . . . . . . . . . . . . . . . . . . 44 2.4.1 Maintenance of service definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 45© Copyright IBM Corp. 2004. All rights reserved. iii
  • 5. 2.4.2 Service level agreement management via historical reporting . . . . . 46 2.4.3 Priority management of real-time faults . . . . . . . . . . . . . . . . . . . . . . 47 2.5 Continuous improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.1 Improving quality of service levels . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.2 Improving efficiency of service level management . . . . . . . . . . . . . . 49 2.5.3 Improving effectiveness of service level management . . . . . . . . . . . 50 Chapter 3. IBM Tivoli products that assist in service level management 53 3.1 IBM Tivoli product mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.1 The monitoring and measurement layer . . . . . . . . . . . . . . . . . . . . . . 54 3.1.2 The service level management layer . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 IBM Tivoli Business Systems Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 56 3.2.3 Benefits of using IBM Tivoli Business Systems Manager . . . . . . . . . 58 3.2.4 Key concepts in IBM Tivoli Business Systems Manager . . . . . . . . . 59 3.2.5 IBM Tivoli Business Systems Manager architecture . . . . . . . . . . . . . 62 3.3 IBM Tivoli Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 65 3.3.3 Benefits of using Tivoli Data Warehouse . . . . . . . . . . . . . . . . . . . . . 66 3.3.4 Key concepts in Tivoli Data Warehouse . . . . . . . . . . . . . . . . . . . . . . 67 3.3.5 Tivoli Data Warehouse architecture . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 IBM Tivoli Service Level Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 72 3.4.3 Benefits of using IBM Tivoli Service Level Advisor . . . . . . . . . . . . . . 74 3.4.4 Key concepts in IBM Tivoli Service Level Advisor . . . . . . . . . . . . . . 75 3.4.5 IBM Tivoli Service Level Advisor architecture . . . . . . . . . . . . . . . . . . 76 3.5 IBM Tivoli Monitoring for Transaction Performance . . . . . . . . . . . . . . . . . 78 3.5.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.5.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 79 3.5.3 Benefits of using IBM Tivoli Monitoring for Transaction Performance80 3.5.4 Key concepts in IBM Tivoli Monitoring for Transaction Performance 80 3.5.5 IBM Tivoli Monitoring for Transaction Performance architecture . . . 83 3.6 IBM Tivoli Enterprise Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.6.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.6.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 87 3.6.3 Benefits of using IBM Tivoli Enterprise Console . . . . . . . . . . . . . . . . 88 3.6.4 Key concepts of event groups in IBM Tivoli Enterprise Console. . . . 89 3.6.5 IBM Tivoli Enterprise Console architecture . . . . . . . . . . . . . . . . . . . . 90 3.7 IBM Tivoli Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.7.1 Business goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94iv Service Level Management
  • 6. 3.7.2 High level description and main functions . . . . . . . . . . . . . . . . . . . . . 94 3.7.3 Benefits of using IBM Tivoli Monitoring . . . . . . . . . . . . . . . . . . . . . . . 95 3.7.4 Key concepts in IBM Tivoli Monitoring . . . . . . . . . . . . . . . . . . . . . . . 96 3.7.5 IBM Tivoli Monitoring architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 983.8 Bringing it all together in support of SLM processes . . . . . . . . . . . . . . . . 100 3.8.1 Service definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.8.2 Real-time monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.8.3 Historical monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.8.4 Fault management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.8.5 SLA reporting and alerting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.8.6 Problem and change management . . . . . . . . . . . . . . . . . . . . . . . . . 107Chapter 4. Planning to implement service level management using Tivoli products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.1 Implementing SLM using Tivoli products. . . . . . . . . . . . . . . . . . . . . . . . . 110 4.1.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.1.3 Ongoing SLM program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.1.4 Improvement process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.2 IBM Tivoli Business Systems Manager V3.1. . . . . . . . . . . . . . . . . . . . . . 117 4.2.1 Propagation, alerts, and events . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.2.2 Basic business system building . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.2.3 Best practices for business system building . . . . . . . . . . . . . . . . . . 120 4.2.4 IBM Tivoli Business Systems Manager business system types . . . 121 4.2.5 IBM Tivoli Business Systems Manager views in an SLM context . . 125 4.2.6 IBM Tivoli Business Systems Manager roles in an SLM context . . 132 4.2.7 Understanding your services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.2.8 Using IBM Tivoli Business Systems Manager 3.1 features for the benefit of SLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.2.9 Using PBT and RLP to manage high availability scenarios . . . . . . 1394.3 Tivoli Data Warehouse V1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1504.4 IBM Tivoli Service Level Advisor V2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.4.1 Building SLAs in IBM Tivoli Service Level Advisor . . . . . . . . . . . . . 156 4.4.2 Supporting SLM with IBM Tivoli Service Level Advisor. . . . . . . . . . 164 4.4.3 Realistic expectations for real-time SLAs . . . . . . . . . . . . . . . . . . . . 186 4.4.4 Integrating IBM Tivoli Service Level Advisor with IBM Tivoli Business Systems Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.5 Additional products supporting SLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 4.5.1 IBM Tivoli Monitoring for Transaction Performance . . . . . . . . . . . . 190 4.5.2 IBM Tivoli Monitoring for Operating Systems . . . . . . . . . . . . . . . . . 192 4.5.3 IBM Tivoli Monitoring for Databases . . . . . . . . . . . . . . . . . . . . . . . . 192 4.5.4 IBM Tivoli Monitoring for Web Infrastructure. . . . . . . . . . . . . . . . . . 193 Contents v
  • 7. Part 2. Case study scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Chapter 5. Case study scenario: IRBTrade Company . . . . . . . . . . . . . . . 197 5.1 Background of the business and its current issues . . . . . . . . . . . . . . . . . 198 5.1.1 The business perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 5.1.2 The Information Technology perspective . . . . . . . . . . . . . . . . . . . . 200 5.2 Existing IT infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.2.1 Systems environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.2.2 Systems management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.2.3 Reporting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.3 A service level management solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 5.3.1 Where we want to be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.3.2 Where we are now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.3.3 How we will get there . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 5.3.4 How we will know we have arrived . . . . . . . . . . . . . . . . . . . . . . . . . 211 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 5.4.1 Additional instrumentation required. . . . . . . . . . . . . . . . . . . . . . . . . 212 5.4.2 Identifying the business service . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 5.4.3 Identifying necessary users roles . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.4.4 Required resource types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 5.4.5 Creating business systems based on business functions. . . . . . . . 231 5.4.6 Defining executive dashboard views. . . . . . . . . . . . . . . . . . . . . . . . 239 5.4.7 Agreeing to and defining service level objectives . . . . . . . . . . . . . . 251 5.4.8 Identifying metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 5.4.9 Enabling data sources in IBM Tivoli Service Level Advisor . . . . . . 260 5.4.10 Setting up schedules, realms, and customers . . . . . . . . . . . . . . . 262 5.4.11 Setting up offerings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 5.4.12 Setting up SLA in IBM Tivoli Service Level Advisor . . . . . . . . . . . 276 5.5 How the new solution works in practice . . . . . . . . . . . . . . . . . . . . . . . . . 292 5.6 Continuous improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Chapter 6. Case study scenario: Greebas Bank. . . . . . . . . . . . . . . . . . . . 315 6.1 Background to the business and its current issues . . . . . . . . . . . . . . . . . 316 6.1.1 The business unit perspective. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 6.1.2 IT management perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 6.2 Existing IT infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 6.2.1 Systems environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 6.2.2 Systems management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 6.2.3 Existing service level management. . . . . . . . . . . . . . . . . . . . . . . . . 322 6.2.4 Business service management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 6.3 A service level management solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 6.3.1 Where we want to be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 6.3.2 Where we are now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326vi Service Level Management
  • 8. 6.3.3 How we will get there . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 6.3.4 How we will know we have arrived . . . . . . . . . . . . . . . . . . . . . . . . . 330 6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 6.4.1 Stage 1: Defining services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 6.4.2 Stage 2: Enhancing instrumentation . . . . . . . . . . . . . . . . . . . . . . . . 333 6.4.3 Stage 3: Determining users and roles . . . . . . . . . . . . . . . . . . . . . . . 337 6.4.4 Stage 4: Determining IBM Tivoli Business Systems Manager resource types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 6.4.5 Stage 5: Creating IBM Tivoli Business Systems Manager business systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 6.4.6 Stage 6: Creating IBM Tivoli Business Systems manager views . . 351 6.4.7 Stage 7: Agreeing to service level agreement objectives . . . . . . . . 363 6.4.8 Stage 8: Defining metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 6.4.9 Stage 9: Preparing for ETLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 6.4.10 Stage 10: Preparing IBM Tivoli Service Level Advisor . . . . . . . . . 371 6.4.11 Stage 11: Creating offerings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 6.4.12 Stage 12: Creating SLAs and OLAs . . . . . . . . . . . . . . . . . . . . . . . 395 6.4.13 Stage 13: SLA reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 6.5 How the SLM solution works in practice . . . . . . . . . . . . . . . . . . . . . . . . . 414 6.5.1 Example 1: Component failure without loss of service . . . . . . . . . . 414 6.5.2 Example 2: Component failure terminates a service. . . . . . . . . . . . 421 6.5.3 Root cause analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 6.5.4 Assessing the SLM solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 6.6 Continuous improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441Part 3. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Appendix A. Service management and the ITIL . . . . . . . . . . . . . . . . . . . . 447 The ITIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Service management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Service delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Service support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Service support disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Configuration management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Service desk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Incident management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Problem management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Change management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Release management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Service delivery disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Capacity management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Availability management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Financial management for IT services . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Contents vii
  • 9. IT service continuity management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Bringing it all together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Constant improvement is a must . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 The power of integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Appendix B. Important concepts and terminology . . . . . . . . . . . . . . . . . 515 IBM Tivoli Service Level Advisor concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . 516 IBM Tivoli Business Systems Manager concepts. . . . . . . . . . . . . . . . . . . . . . 521 Appendix C. Scripts and rules used in this book. . . . . . . . . . . . . . . . . . . 527 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537viii Service Level Management
  • 10. NoticesThis information was developed for products and services offered in the U.S.A.IBM may not offer the products, services, or features discussed in this document in other countries. Consultyour local IBM representative for information on the products and services currently available in your area.Any reference to an IBM product, program, or service is not intended to state or imply that only that IBMproduct, program, or service may be used. Any functionally equivalent product, program, or service thatdoes not infringe any IBM intellectual property right may be used instead. However, it is the usersresponsibility to evaluate and verify the operation of any non-IBM product, program, or service.IBM may have patents or pending patent applications covering subject matter described in this document.The furnishing of this document does not give you any license to these patents. You can send licenseinquiries, in writing, to:IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.The following paragraph does not apply to the United Kingdom or any other country where such provisionsare inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDESTHIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimerof express or implied warranties in certain transactions, therefore, this statement may not apply to you.This information could include technical inaccuracies or typographical errors. Changes are periodically madeto the information herein; these changes will be incorporated in new editions of the publication. IBM maymake improvements and/or changes in the product(s) and/or the program(s) described in this publication atany time without notice.Any references in this information to non-IBM Web sites are provided for convenience only and do not in anymanner serve as an endorsement of those Web sites. The materials at those Web sites are not part of thematerials for this IBM product and use of those Web sites is at your own risk.IBM may use or distribute any of the information you supply in any way it believes appropriate withoutincurring any obligation to you.Information concerning non-IBM products was obtained from the suppliers of those products, their publishedannouncements or other publicly available sources. IBM has not tested those products and cannot confirmthe accuracy of performance, compatibility or any other claims related to non-IBM products. Questions onthe capabilities of non-IBM products should be addressed to the suppliers of those products.This information contains examples of data and reports used in daily business operations. To illustrate themas completely as possible, the examples include the names of individuals, companies, brands, and products.All of these names are fictitious and any similarity to the names and addresses used by an actual businessenterprise is entirely coincidental.COPYRIGHT LICENSE:This information contains sample application programs in source language, which illustrates programmingtechniques on various operating platforms. You may copy, modify, and distribute these sample programs inany form without payment to IBM, for the purposes of developing, using, marketing or distributing applicationprograms conforming to the application programming interface for the operating platform for which thesample programs are written. These examples have not been thoroughly tested under all conditions. IBM,therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,modify, and distribute these sample programs in any form without payment to IBM for the purposes ofdeveloping, using, marketing, or distributing application programs conforming to IBMs applicationprogramming interfaces.© Copyright IBM Corp. 2004. All rights reserved. ix
  • 11. TrademarksThe following terms are trademarks of the International Business Machines Corporation in the United States,other countries, or both: Eserver® DB2® Redbooks (logo) ™ ibm.com® IBM® Redbooks™ z/OS® IMS™ Tivoli Enterprise™ AIX® Lotus® Tivoli Enterprise Console® CICS® NetView® Tivoli® CICSPlex® OMEGAMON® TME® Database 2™ OS/390® WebSphere® Domino® OS/400® DB2 Universal Database™ Rational®The following terms are trademarks of other companies:Java and all Java-based trademarks and logos are trademarks or registered trademarks of SunMicrosystems, Inc. in the United States, other countries, or both.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in theUnited States, other countries, or both.Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, othercountries, or both.UNIX is a registered trademark of The Open Group in the United States and other countries.Linux is a trademark of Linus Torvalds in the United States, other countries, or both.Other company, product, and service names may be trademarks or service marks of othersPeregrine ServiceCenter is a trademark of Peregrine.x Service Level Management
  • 12. Preface Traditional availability management focuses on managing the state of IT resources at a component level, without the context of the required service necessary to support vital business functions. As IT organizations mature and focus more on meeting business objectives, they recognize the value of providing sustained levels of availability. They also improve service quality that is consistent with business objectives and cost constraints. Managing IT costs requires repeatable and measurable processes such as the best practices for service level management (SLM) documented in the IT Infrastructure Library (ITIL). Central to the ITIL best practices are the service management processes. These are subdivided into the core areas of service support (day-to-day operation and support) and service delivery (long-term planning and improvement). This IBM® Redbook takes a top-down approach that starts from the business requirement to improve service management. This includes the need to align IT services with the needs of the business, to improve the quality of the IT services delivered, and to reduce the long-term cost of service provision. It focuses on how clients accomplish this by implementing SLM processes supported by IBM Tivoli Service Level Advisor and IBM Tivoli Business Systems Manager. The approach used in this book leverages Tivoli® and non-Tivoli monitoring sources. IBM Tivoli Monitoring for Transaction Performance, IBM Tivoli Monitoring, and various IBM Tivoli Monitoring PACS, along with Peregrine ServiceCenter, serve as interface points to provide the end-user perspective of service delivery. For IT managers and technical staff who are responsible for providing services to their customers, use this IBM Redbook as a practical guide to SLM with IBM Tivoli products. It takes you from a general outline of SLM to specific implementation examples of banking and trading that incorporate the Tivoli monitoring products. The key elements that are addressed in this redbook are: Organizational considerations for implementing the ITIL processes Identifying which services or business functions will be used for the initial deployment Determining the metrics and monitoring sources required for operational and service level agreements (SLA) definition and evaluation, including business schedules and maintenance periods© Copyright IBM Corp. 2004. All rights reserved. xi
  • 13. Leveraging IBM Tivoli Business Systems Manager for configuration and availability management of services Peregrine ServiceCenter for service desk in a component-level for SLA, as well as managing service incidents in real-time The value of understanding the impact of end-user response time on service delivery Managing end-to-end services that include mainframe and distributed components Improving service delivery with proactive service management using predictive analysis and operational status alerts Providing ongoing executive-level status, and on-demand reporting The next steps for expanding the deployment using the ITIL continuous improvement process approach Overall business value attained through the implementation of these processes and toolsThe team that wrote this redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), Austin Center. Edson Manoel is a software engineer at IBM working in the ITSO, Austin Center, as a Senior IT Specialist in the systems management area. Prior to joining the ITSO, Edson worked in the IBM Software Group, Tivoli Systems, and in IBM Brazil Global Services Organization. He was involved in numerous projects in designing and implementing systems management solutions for IBM Clients and Business Partners. Edson holds a Bachelor of Science degree in applied mathematics from Universidade de Sao Paulo, Brazil. Kimberly Cox is an IBM Certified IT Specialist with IBM Software Services for Tivoli. She joined IBM in 1998. She has six years of field experience and her current area of expertise is the architecture and deployment of IBM Tivoli Business Systems Manager/Distributed. She holds a master degree in computer science and engineering from Pennsylvania State University. Eswara Kosaraju is an advisory software engineer for the IBM Tivoli Software Group in Research Triangle Park, North Carolina. He joined IBM in 1999. He holds a master degree in science and technology in engineering physics from Regional Engineering College, Warangal, India.xii Service Level Management
  • 14. Matt Roseblade is a services consultant with the PAN-EMEA Services for TivoliSoftware based in the United Kingdom (UK). He has worked for IBM for nineyears and has four years of experience in working with IBM Tivoli BusinessSystems Manager on engagements throughout Europe. Prior to working for IBMSoftware Group, Matt worked for IGS SSO leading a team responsible for thesystems management of IBM and outsourced z/OS® systems across EMEA.During his 14 years in IT, Matt has acquired 12 years experience in systemmanagement disciplines on the mainframe.Alex Shafir is an advisory software engineer with the IBM Tivoli Software Groupin Research Triangle Park, North Carolina. He has been working with IBM TivoliBusiness Systems Manager since 1997 and joined IBM in 2000. He has over 30years of IT experience in both technical and management positions. He has beeninvolved in SLM, capacity planning, and performance management since 1984.He holds master degree in electrical engineering from Polytechnical Institute,Riga, Latvia.Venkat Surath is a senior IT specialist, as well as an IBM Certified IT Specialist,and part of IBM Software Services for Tivoli Americas. He holds a master degreein computer science from Illinois Institute of Technology, Chicago. Upongraduation, he joined Communications Products Division, IBM Research TrianglePark, NC in 1983 as a software engineer developing network managementsoftware. In 1997, he joined Tivoli Services North America and provides TivoliBusiness Systems Management services. His areas of expertise include IBMTivoli Business Systems Manager (Distributed) and Tivoli Monitoring forTransaction Performance.Eduardo Tanaka is a software engineer for the IBM Software Group, TivoliDivision in Research Triangle Park, North Carolina. He worked nine years inUNIX® server hardware and software development and management for aBrazilian company. Then, in 1990, he joined IBM where he served as thedevelopment, function and system test team leader for various system andnetwork management products. He holds a degree in electronic engineering fromthe Instituto Tecnologico de Aeronautica in Brazil.Brian Watson is a consulting IT specialist from Tivoli Services, EMEA NorthRegion, IBM Software Group. He has worked for IBM for over three years, hasover 25 years of IT experience in both public and private sectors, and specializesin systems management. He was one of the first people to be ITIL certified in1995, and has successfully completed many large and complex systemsmanagement projects including implementations of IBM Tivoli Business SystemsManager. Preface xiii
  • 15. Front row (left to right): Matt Roseblade, Kimberly Cox, and Venkat Surath; back row: Edson Manoel, Eswara Kosaraju, Eduardo Tanaka, Alex Shafir, and Brian Watson Thanks to the following people for their contributions to this project: Peer van Beljouw Ruth van Ouwerkerk ABN AMRO Bank, Netherlands Budi Darmawan Morten Moeller ITSO, Austin Center Rosalind Radcliffe BSM Integration Architect, IBM Software Group, Raleigh Eduardo Patrocinio Tivoli SWAT Team, IBM Software Group, Raleigh Jayne T. Regan Service Level Advisor Development Manager, IBM Software Group, Raleigh Michael D. Tabron Tivoli Service Level Advisor Interaction Designer, IBM Software Group, Raleigh Joe Belna Shawn Clymer Subhayu Chatterjee TSLA Development team, IBM Software Group, Raleighxiv Service Level Management
  • 16. Gareth Holl TSLA L2 Support, IBM Software Group, Raleigh Tom Odefey TBSM SVT Specialist, IBM Software Group, Raleigh Tony Bhe ITM SVT Specialist, IBM Software Group, Raleigh Jon O. Austin John Irwin Yoichiro Ishii Tivoli Customer Programs, IBM Software Group, RaleighBecome a published author Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. Youll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, youll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Preface xv
  • 17. Comments welcome Your comments are important to us! We want our Redbooks™ to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an email to: redbook@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493xvi Service Level Management
  • 18. Part 1Part 1 Fundamentals This part includes the following chapters: Chapter 1, “Introduction to service level management” on page 3 Chapter 2, “General approach for implementing service level management” on page 23 Chapter 3, “IBM Tivoli products that assist in service level management” on page 53 Chapter 4, “Planning to implement service level management using Tivoli products” on page 109© Copyright IBM Corp. 2004. All rights reserved. 1
  • 19. 2 Service Level Management
  • 20. 1 Chapter 1. Introduction to service level management This chapter introduces service level management (SLM). It also outlines an approach to the management of the business-oriented delivery of IT services that this book details in later chapters. Refer to Appendix A, “Service management and the ITIL” on page 447, for details about the organization and activities of SLM and the contributing IT management disciplines.© Copyright IBM Corp. 2004. All rights reserved. 3
  • 21. 1.1 Service level management overview The goal of maximizing profits drives change as well as innovation. It often involves the use of IT to gain a competitive advantage in selling a company’s products and services. To achieve their goals, business units partner with an IT organization to implement technology projects and thus become IT customers. Accordingly, IT organizations are hired by business units to provide technology services. Therefore, they must meet their requirements for those services. In today’s cost-conscious environment, IT organizations are under pressure to reduce costs even as they must deliver a higher level of service to increasingly well informed users. Why service level management? For this reason, customer perception of the availability and performance of these services drives customer satisfaction. As a service provider, an IT organization must be able to demonstrate and guarantee quality of service to its customers. However, IT management has often struggled to measure delivered services while reconciling such measurements with the perceived quality of this delivery. To solve this problem, IT organizations are deploying SLM that includes contracts between IT and its clients that specify the client expectations, IT’s responsibilities, and the compensation that IT will provide if the goals are not met. The main factors for driving interest to SLM are: Complexity: A dramatic increase in the number of applications, their importance, and demand on IT infrastructure Dissatisfaction: Increasing user sophistication and growing dissatisfaction among users with service that they receive from IT Better technology: More mature technology that can provide end-to-end measurement, reporting, and management at a reasonable cost and offer more simple process What is service level management? SLM is a means for the lines of business (LOB) and IT organization to explicitly set their mutual expectations for the content and extent of IT services. It also allows them to determine in advance the steps to take if these conditions are not met. The concept and application of SLM allows IT organizations to provide a business-oriented, enterprise-wide service by varying the type, cost, and level of service for the individual LOB.4 Service Level Management
  • 22. According to the highly popular, process-based methodology IT InfrastructureLibrary (ITIL), SLM is the process of negotiating, documenting, agreeing andreviewing business service requirements and targets, within service levelrequirements and agreements between service providers and their customers.These relate to the measurement, monitoring, reporting, reviewing, andcontinuous improvement of service quality as delivered by the IT organization tothe business.ITIL’s methodology provides two models for IT activities: service delivery andservice support.Service deliverySLM, along with availability management, capacity management, IT servicecontinuity management, and financial management for IT services, comprisesthe service delivery model. The primary role of this model is to offer a proactiveprocess of planning and management of service according to the plan.Service supportThe service support model includes incident management, problemmanagement, change management, release management, and configurationmanagement. The primary role of this model is to offer operationalimplementation and monitoring of service according to the plan.Figure 1-1 shows how the service delivery and service support models fit in theITIL roadmap for service management. Planning to implement Service Management Service Management The Business Information Perspective Service Delivery Service Support Technology perspective Linking business goals to IT Providing IT Services Providing IT Services cost-effectively support and maintenance Applications Security IT Infrastructure Management Management ManagementFigure 1-1 The ITIL service management roadmap Chapter 1. Introduction to service level management 5
  • 23. According to the ITIL, SLM relates to the other aforementioned disciplines as follows: Supported by availability management, IT service continuity management, capacity management, problem management, and configuration management Provides information to incident management and change management Monitored via financial management for IT services, incident management, capacity management, and availability management Supports application management, business processes, and event management SLM is the disciplined, proactive methodology. Procedures are used to ensure that adequate levels of service are delivered to all IT users in accordance with business priorities and at an acceptable cost. Service levels typically are defined in terms of availability, responsiveness, integrity, and security delivered to users of the service. Pros and cons of service level management Although the duration and scale of SLM implementations may vary, both large and small corporations can capitalize on the benefits of SLM. They do so by choosing the components that are most appropriate to their specific SLM needs. Implementing SLM requires time and effort. It is difficult to rationalize allocation of IT resources to this project if IT is already working with limited resources. In addition, IT clients sometimes abuse the SLM processes, especially when they aim for unreasonable or unattainable service level commitments. However, this should not stop IT management from developing SLM, which can be equally important for both business units and an IT organization. SLM increases the efficiency of an IT organization and introduces a financial incentive and penalty system for service delivery. Indeed, the rising popularity of SLM testifies to its value. For an IT organization, the effective SLM is often a matter of survival particularly if its mission is to operate as a business. The product of an IT organization is the service it delivers to business units. For an IT organization, providing quality services is not enough. The service must consistently be of the same high quality both in actual delivery and in the eyes of the users of the services. SLM supports IT organizations to improve the quality of the services provided and the quality of the services as it is perceived by the users of IT services. Refer to Appendix A, “Service management and the6 Service Level Management
  • 24. ITIL” on page 447, for a definition of quality of services and how it is perceived by users and customers of IT services. Both an IT organization, as a seller, and a business unit, as a buyer, need a contract that clearly defines both the capabilities and limitations of this process. For reasons of customer satisfaction and cost control, the product must meet the specifications of this contract.1.2 Service level management benefits Businesses need to respond quickly to market demands and seek to maximize profits. These goals often result in a high volume of change for IT organizations. Every IT organization has an objective to align its goals with business requirements and to better support business needs. They use SLM to ensure that scarce IT resources are prioritized to focus on key business requirements. By implementing SLM, IT organization can achieve many of their goals. However, they must overcome many challenges to ensure that the SLM program is successful. Goals The goals of SLM are: Understand and meet the requirements of customers and end users Use resources efficiently, effectively, and provide value for money Improve continuously through a process of learning and growth Use internal process to generate added value for customers and survive Establish a business-like relationships between the customer and supplier Challenges The challenges of SLM are: Divergent views of business and IT organizations Diversity of organization business areas Changing the mind set from products and systems to services Perception of IT (historically not always good) Unknown components, dependencies, and ownership Poor quality management information and metrics Unable to justify investment or assess risk No measure of proof of improvement Coping with infrastructure complexity Providing consistent and stable services Chapter 1. Introduction to service level management 7
  • 25. Faced with many constraints, an IT organization wants recognition for providing good services based on component-centric measurement metrics. At the same time, business units feel that they are paying for a service, but cannot perform their work and do not trust IT that always report good service. SLM offers evolution for measuring IT effectiveness by moving from the component-based evaluation of service to service-based management. Figure 1-2 illustrates a situation where the reduction of the downtime of components reported by the IT organization does not improve customer satisfaction because the damage has already been done. It emphasizes the fact that business units and IT organizations have different views of the customer perception on the quality of the services provided. BUSINESS MANAGER IT ME AS UR EM CUSTOMER EN IMPACT TS TS EN EM UR AS Outages S ME ES SI N BU IT COMPONENTS DOWNTIME IT MANAGER Time Figure 1-2 IT and business views often differ When used correctly, SLM helps an IT organization to deploy resources fairly, defend itself from user attacks, and advertise good service.8 Service Level Management
  • 26. How can SLM help IT to deploy resource fairly? Client satisfaction SLM necessitates IT management to initiate a dialog with business units to understand the requirements for service. It also forces business units to clearly state their requirements and expectations. Improved client satisfaction is the main benefit of SLM, which ensures it through negotiated SLAs, established benchmarks for service measurement, and continuing dialog through reporting and reviews. Managing expectations SLM makes it possible to avoid an expectation creep of rising levels of IT clients’ undocumented expectations. Undocumented users’ requirements and expectations levels usually lead to expectations staying ahead of service that is being delivered. SLAs document negotiated requirements and establish expectations. They also serve as brakes when users want higher levels of service than IT committed to deliver. Resource regulations SLM provides a mechanism for governing IT resources. It allows IT to reject demands for resources to applications that unfairly tie up resources, and therefore, regulate workload based on business priorities. SLM helps to avoid capacity problems by providing early warning of SLAs being violated. Additional equipment might be required to support IT commitments. Cost control SLM helps IT to determine, through dialog with users, the level of service required and to determine the acceptable capacity and staffing it needs to provide. SLM can demonstrate that desirable service is not always affordable and can impact costs through moderating user demands for higher levels of service. It allows IT to explain the financial impact of higher levels of service and avoid the unnecessary cost by forcing users to justify the additional cost.SLM helps to change relationships between business units and IT from anegative acceptance of IT as a necessary evil to viewing IT as an asset inexecuting their mission. When the clear service objectives are documented andnegotiated measurement reporting is in place, IT has the means to manage itsresources as well as user dissatisfaction.BenefitsIn summary, the benefits of SLM are: IT service designed to meet agreed requirements Clearly defined roles (activities, responsibilities, and authority) Measurable, realistic SLAs for improved customer and supplier relationships Balances service requirements against the costs Chapter 1. Introduction to service level management 9
  • 27. Reduces risk of unpredictable demand and capacity problems Helps identify service weaknesses Allows underpinning of supplier management Provides basis for charging and measuring value Establishes an improvement baseline1.3 Service level management components To create and maintain SLM, IT managers need well defined processes, proven tools, a dedicated effort, and a business wide commitment. SLM shifts IT management perspective away from technology and toward the demands of the business and user experiences. It introduces new methods and procedures as well as makes enhancements to the old ones. SLM focuses on the management of an IT service in support of a specific business process. An IT service includes applications and infrastructure resources used by this business process. Management includes planning, monitoring, and reporting. SLM uses SLAs to identify service and determine its management criteria. SLM is a process that is supported by several other processes, including performance and availability management. Both performance and availability management processes are essential for monitoring SLAs. However, an understanding of end-user perspectives through synthetic transactions and communications with users is also critical. Accordingly, monitoring of performance and availability must be adjusted to account for user experiences. For this reason, IT operations must incorporate end-user experiences and business function knowledge into the management IT infrastructure and applications. In addition, IT support must incorporate business requirements into the asset management, change management, and incident management. The following sections introduce four SLM components that are essential for implementing a successful SLM program. Processes Documentation People Tools10 Service Level Management
  • 28. 1.3.1 Processes The functions in SLM can be divided as follows: Identify users’ expectations and define parameters for service. Ideally, IT must identify all of the business processes that must be managed. In practice, it is acceptable to select the critical business processes during the first stages of the SLM process implementation and then incorporate additional business processes as the SLM process mature. The IT organization can work with business owners to pinpoint the elements of these business processes. They can define service parameters such as end-user expectations of service, participating IT application and infrastructure components, and metrics for measuring service levels. Assess service capabilities and negotiate service agreements. First an IT organization must have a clear understanding of service expectations, composition of service elements, and service level measurement metrics. Then it must collect data and assess its current capabilities for meeting a customer’s expectation of service levels. After studying current capabilities for delivering all services required and indentifying opportunities for improvement, IT management is ready to talk with customers about the service levels that it can provide. IT should avoid technical terminology and describe services and expectations in a manner that is understandable to its customers. At the same time, IT should fully understand what service levels it can deliver and achieve agreement from its customers on service levels measurement and reporting criteria. IT must document negotiated expectations and measurements metrics as well as agreed upon acceptable service levels values. Manage to meet service level objectives (SLOs). IT must align its processes to proactively monitor, measure, and manage against negotiated SLAs. Accordingly, IT must develop SLOs to meet SLA obligations for underlying IT components, measure actual values against SLOs, and associate the measured status against the SLAs. Upon recognition of service level degradation (preferably through real-time alerts), IT can immediately start finding a problem and restoring service to acceptable levels as defined by SLAs. If the problem is serious, IT may also notify users so they can avoid affected services and calls to the help desk. SLAs that relate to IT operations and support (OLAs) recognize component issues quickly and evaluate their measurements prior to their impact on SLAs and IT customers. IT must come up with monitoring processes, measurement metrics, and automation that allow prompt responses to problems by technical staff in addition to reporting an OLA’s status to management. Chapter 1. Introduction to service level management 11
  • 29. SLM uses reporting to communicate overall service level performance to IT and business management. Effective reporting should show IT performance against service-level commitments (successes and failures). It can be used together with financial incentives to improve IT processes and users behavior. Continue service refinement and improvement. The SLM process should always be examined for process effectiveness, service changes, and reporting accuracy. Customer expectations change as business processes grow and new applications and users are added. As monitoring technology improves, IT can expand metrics that measure component performance and customer satisfaction. IT must periodically re-evaluate the services it provides. Service improvement is a continuous process that allows IT to add more value, adjust to new realities, justify new technology, and often derive more revenue. The same can be said about the SLM process that needs continuous improvement to gain the trust of business owners, improve efficiency through automation, and effectiveness through a better understanding of business-to-IT relationships. Figure 1-3 illustrates the SLM functions. Negotiate SLAs Manage and monitor Define parameters for SLOs services Service refinement and improvement Figure 1-3 SLM process12 Service Level Management
  • 30. 1.3.2 Documentation Because SLM relies on several parties involved in defining the processes, negotiations, penalties, and so on, documentation is a must. The following documents support SLM: Service level agreements An SLA is an agreement between business units (the customer) and IT organization (the service provider). It describes the service and service level measurement metrics, defines the approval and reporting process, and identifies the primary users. It can also include financial terms and conditions. SLAs provide a mechanism for establishing accountability for both IT and their customers for the provided service levels which are negotiated and agreed to based upon business requirements, priority, and cost. SLA measurements must be directly aligned with customer expectations. SLAs are the basis for service level evaluation and improvement processes that include periodic reviews and adjustments if needed. Operational level agreements An operational level agreement (OLA) is an internal agreement that shout be established between all business and IT groups prior to the execution of an SLA. The OLA establishes specific requirements that each IT group needs to meet in support of service levels and make them accountable for their contribution to the overall improvement of service levels. Well-defined OLAs show IT management which areas have more impact on service levels, where to focus attention and financial rewards, and how each group can contribute if business requirements require a change of SLAs. Underpinning contract IT should establish underpinning contracts (UCs) for any service provided by external service providers and vendors. UCs add accountability for external component of service levels in the same way as OLAs account for the internal components of service levels. IT can use the contractual agreements that they have with their third-party vendors and feed the pertinent data into the SLM process. As service levels need to be changed, IT may need to re-negotiate external contracts with vendors and modify the UCs. Figure 1-4 illustrates the flow of customer, internal, and external contracts. Service catalog The service catalog provides a place to document all services provided to the customers and to record such details as key features, components, charges, and dependencies for each service. Chapter 1. Introduction to service level management 13
  • 31. Customers SLA SLA IT Services Provider Service 1 Service 2 IT Infrastructure Underpinning OLA Contracts Internal organization External organizations Figure 1-4 SLM customer, internal, and external contracts Service level objectives SLOs define service levels that have been agreed to by parties that negotiated SLAs which need to be monitored and reported. They include one or more service level indicators (SLIs) presented in the business context. The SLO defines the component of service and how it is being measured. SLIs determine measurement metrics for SLM quantification. SLIs should reflect user perspective such as pain points and priorities, service availability, and responsiveness. For example, the most common SLOs are availability and performance. A service availability SLO may include the SLI measured in the percentage of time that the service was in available state. A performance SLO may include two SLIs: service responsiveness (response time) and completed work (number of transactions). An IT organization must use monitoring for measuring the actual results of SLIs and reporting for communicating these results to business and IT managers. The format, details, and period vary depending on the recipients of reports. SLM can also include real-time information, alerting IT when results approach or breach service levels are guaranteed by SLAs.14 Service Level Management
  • 32. Service improvement program SLM is a continuous process that includes service level improvement and SLM improvement activities. IT should never be satisfied with current level of service even if it satisfies its obligations to customers. IT should develop a service improvement program and document a service quality plan. This plan should include how to maintain awareness of changing business objectives, cost-effectively add new technology, improve daily operations, and expand SLIs and reporting to match user perception of service as much as possible.1.3.3 People The SLM process requires the involvement of people at various levels within business and IT organizations. The request for service improvements often starts with the head of a business unit or a senior executive who begins demanding more consistent service and accountability from IT. IT management may respond with tactical improvements but may be forced to implement the SLM program. SLM is a collaborative effort. Its implementation includes a number of people in dedicated or supporting roles. Responsibility for overall management of the SLM program is most likely to be assigned to a senior IT executive. IT may also assign a dedicated project manager and a dedicated service level manager. The project manager is responsible for implementing the SLM project. A service level manager is active throughout the entire implementation phase as well as after the phase. This person also coordinates ongoing management and improvement programs. In their effort, both the project manager and the service level manager need support from line managers of IT and business groups. The SLM team must include representatives from both business units and IT service delivery and may require some assistance from consultants. However, SLM is primarily an IT effort as it is IT who must handle the technical aspects of the SLM implementation, deployment, and operation. The SLM program must have an executive sponsor who provides funding for the program and is ultimately responsible for the success of the SLM program. For more details about the roles and responsibilities of the people involved in implementing SLM, see 2.2.1, “Identifying roles and responsibilities” on page 26.1.3.4 Tools While developing the SLM plan, the IT organization must choose tools to enable the SLM process that is being developed. Depending on the selected measurement metrics and the service composition of related IT resources, these Chapter 1. Introduction to service level management 15
  • 33. tools support monitoring of the chosen service indicators and user experiences. They also provide analytical capabilities and aggregation for reporting. In addition, IT must organize the collected data and make it accessible to everybody with a stake in the SLM process. Analytics and reporting must present this data in a manner that aligns the service views of both IT and their customers, allowing them to reconcile the customers’ perception of service with the service levels delivered by IT. IT wants to understand how resource performance and availability affects service levels and what adjustments are needed to improve service. Customers want to make sure that IT delivers availability and responsiveness to the critical applications that they use for automating their business processes. When their business process is impacted, they want IT to accurately report it so they can impose the negotiated penalties on IT. SLM is a hot topic, and many companies have made claims that their products provide SLM solutions. Some products are specifically designed for SLM. Others offer only aspects of monitoring capabilities but still market their products as SLM solutions. When implementing SLM, IT should choose the following tools to meet their design specifications: Monitoring tools to provide the measurement metrics they need to collect Reporting tools that process the data being captured and satisfy all levels of report recipients Analytical tools that provide aggregation and analysis of the collected SLM data in a manner that offers fast recognition of business impact and proactive response Administration tools that improve the productivity of SLM operators and users as well as provide the integration of monitoring, reporting, and analytical tools This book introduces solutions provided by IBM, which include a wide range of products that can monitor a variety of distributed and mainframe servers, databases, transactions, networks, Web servers and end-user experiences. In addition, IBM offers analytical products in SLM space that provide the real-time integrated event console, event correlation, business service management (BSM), and proactive SLM. All these products accept data from the majority of today’s monitoring products.16 Service Level Management
  • 34. 1.4 Business service management approach to servicelevel management The philosophy of managing services in a business context is receiving more traction with IT organizations that are trying to improve relations with their customers. These same organization are also trying to overcome historical challenges such as customer perception and the increasing complexity of technology. Understanding how shared infrastructure resources are being used by business processes significantly improves the ability of business and IT executives to negotiate, measure, and evaluate service contracts. Many IT organizations are turning to BSM solutions to facilitate a business-defined view of IT-delivered services. BSM solutions provide facilities and analytics that enable IT to manage service levels with the business consumer for a specific business process to ensure that the SLA associated with this process is fulfilled. Why business service management? Earlier this chapter introduced SLM as the management of IT resources to deliver the required service at the required level of quality. BSM allows IT to incorporate business knowledge into the service management process and to translate data from traditional infrastructure and application management tools into business-level representations. BSM relies on IT organizations that work with business units to map resource-to-service relationships and organize them into structures that depict and visualize the components of IT infrastructure as well as automate components of the business process based on the knowledge of their relationships. Accordingly, with BSM, IT management and business executives can reconcile their perspective of IT performance. This is because BSM can report both real-time status and historical service-level compliance for each business function supported by IT. What is business service management? BSM is a service management application that aligns IT operations with business processes. Therefore, it allows business functions to receive maximum leverage from IT resource management. BSM solutions enable real-time management of events and service levels based on knowledge of their relationships to an IT service provided to a business entity responsible for a business process. BSM provides IT with a set of algorithms and visualizations that IT must incorporate in its SLM processes. It is designed to display and report the service Chapter 1. Introduction to service level management 17
  • 35. delivery health and business impact of IT based on performance and availability of IT resources. The visualization of BSM runs on federated event and monitoring data as well as business and IT relationship data. The four aspects of BSM are: It consists of identifying the components of a business system. It involves measuring the performance and availability of those components. It ensures that the components are performing within SLOs. It alerts to any deviation or potential deviation from SLOs. The concepts behind BSM include: Resources are components of IT infrastructure. Business transaction is a group of IT resources supporting a particular IT workload. Business system is a group of resources that supports a business goal. Business process is composed of some automated (IT services based) and some manual steps. When policy data or service level information is attached to a business system, it turns into an IT service. IT service can be perceived as a collection of IT resources that make up the automated part of the business process.1.4.1 Convergence of business service management and service levelmanagement With BSM, an IT organization gains insight into a business process. It can use this insight to design SLM based on the aforementioned relationship structures that we call business systems. A business system is a representation of a group of diverse but interdependent enterprise resources that are used to deliver specific business functionality. Business systems allow flexible and automated arrangements of IT resources into models of services that IT provides to automate business functions. Together, they represent what we call the Business/IT knowledge base that is an important element of the SLM methodology. As a result of a joint effort to develop the Business/IT knowledge base, an IT organization and business units have a framework for SLA that allows them to: Identify all components of a service Create SLA and OLA contracts based on business systems Measure resource performance and availability by business systems18 Service Level Management
  • 36. Get service violation and trend alerts for any deviation or potential deviation from the SLO Ensure that services are performing within the SLO The Business/IT knowledge base provides the foundation for BSM and SLAs. In reality, BSM allows IT to decompose business processes into IT systems and document the negotiated service levels in SLAs to be managed by BSM via monitoring and analytics organized by business systems. BSM accepts data from a variety of performance and event data sources that monitor IT resources. The BSM analystics then consume this data to determine business systems status and understand its business impact. Figure 1-5 demonstrates that business systems are a cornerstone for establishing service levels and managing IT resources based on business objectives for IT services. Underpinning Historical SLA OLA Contracts Reporting Service Level Management Service Business IT Services Business Services Business Systems Business Systems The Systems - databases The Business - banking - web servers Technology - trading - banking application - e-commerce - application support Service - development Service Business Business Business Systems Systems Systems Business Systems Management Incident Contextual Real time resolution Business views alerting monitoring prioritizationFigure 1-5 Business system organizes IT resources and other business systems A successful SLM program that aims to solve user perception issues should establish a common understanding between business units and an IT organization on service delivery and quality of service measurements. As outlined earlier, the BSM approach to SLM helps this effort by collecting business knowledge and exposing the use of resources by services. This makes SLA contracts and measurement metrics more meaningful to both IT and business units. Chapter 1. Introduction to service level management 19
  • 37. 1.5 Improving service level management throughintegration SLM is the continuous process of measuring, reporting, and improving the quality of agreed upon service that an IT organization provides to the business. This requires that an IT organization clearly understands each service it provides, its business importance and priority, who consumes this service and how, and the IT resources are used. Such information is usually dispersed and requires a significant effort from IT to obtain and organize it a meaningful way that can expose business use IT resources. As demonstrated earlier, you can use BSM to compose and refine services from related resource and business systems objects. Service compositions defined by BSM allow IT to design SLAs and service level measurement criteria in an integrated manner and provide: Improved effectiveness of SLAs When a IT organization uses the same definitions of services for aggregating monitored data, service management, and service evaluation, it can significantly improve the effectiveness of SLAs and make investigations of SLA violations more productive. Improved effectiveness of communication Through a set of federated monitoring data and views, IT can use service compositions to effectively communicate with users (while developing and reporting SLAs) and to prioritize management of incidents. Figure 1-6 presents a high-level view of integrating monitoring, service management, and service evaluation around service compositions. Management of IT resources within the context of the business services they provide includes: Automatic discovery of IT resources and their relationships Automation for constructing services and business systems Detections of incidents for IT resources in a service context Determination of service status and business impact of incidents Warehousing of historical data for IT resources and services Service level evaluation and alerting in service context Reporting service health and service level compliance with SLAs20 Service Level Management
  • 38. Business Service Service Level Management Management - Business Systems - SLA - Services - OLA - Contracts Service Management Service Evaluation Measurement Metrics Business/IT Knowledge Base Monitoring Service Service Composition Delivery Business Business Applications Infrastructure Process Knowledge The Business Information Technology RequirementsFigure 1-6 Using business knowledge for managing IT servicesLarge enterprise IT environments deploy many system management products tooperate their diverse resources. It is difficult to integrate data from such a varietyof data sources into the SLM process. BSM solutions meet this challenge byaccepting data from all major monitoring vendors. BSM then integrates this databy supplying business analytics and automation that allow IT to define andmanage services throughout the life cycle of SLM.Armed with business knowledge and negotiated service composition andmeasurement metrics, an IT organization can design its business systemmanagement, SLM, and monitoring processes to measure quality of service thatcorrelates with user perception. To improve acceptance, IT must continue to Chapter 1. Introduction to service level management 21
  • 39. refine the service composition and measurement metrics until they become transparent to business units.1.6 Scope of this book As outlined in this chapter, there are many aspects to SLM. One of the main objectives is to relate the definition of service to the perception of IT users and business unit management. The quality of services delivered to these users is judged according to users’ ability to use services effectively and cost-efficiently when required by their job functions. Although IT managers place a high priority on meeting this objective, the task of reporting on quality of service that users accept as matching their experiences is often hit and miss. The BSM approach (outlined earlier in this chapter) to SLM offers significant improvements in this area by making business to IT relationships more factual and transparent through several implementation steps. The topics in this book are structured to guide you through analysis of SLM and its planning aspects to detail implementation of BSM, SLM, and monitoring integration approach using Tivoli products. They include a summary of improvement opportunities for each topic. The remainder of this book is divided into the following chapters: Chapter 2, “General approach for implementing service level management” on page 23, describes a generic approach for SLM implementation, following the ITIL process improvement model as close as possible. Chapter 3, “IBM Tivoli products that assist in service level management” on page 53, provides an overview of the IBM Tivoli products that support SLM processes. Chapter 4, “Planning to implement service level management using Tivoli products” on page 109, outlines the planning and implementation of SLM and BSM through the integration of several IBM Tivoli products. Chapter 5, “Case study scenario: IRBTrade Company” on page 197, provides a test case of the SLM program implemented to manage the distributed environment for a trading company. Chapter 6, “Case study scenario: Greebas Bank” on page 315, provides a test case of the SLM implementation of enterprise management (mainframe and distributed) for a bank. Appendix A, “Service management and the ITIL” on page 447, discusses the various components and definitions behind Service Management in ITIL terms. It is designed as a reference for Anyone involved in the SLM process.22 Service Level Management
  • 40. 2 Chapter 2. General approach for implementing service level management Service level management (SLM) is an important initiative. It requires the participation and support of many resources. A successful implementation has an established business need, commitment from all those involved, and funding to ensure adequate resources and tools for completion. It requires a strategy and a flexible plan for negotiating, implementing, and maintaining service level agreements (SLAs). The typical motivation for SLM is the need to improve IT service delivery as perceived by customers. In many cases, the team responsible for IT service delivery does not have all the information required to meet the needs of the business. As a result, IT delivers and reports on top quality service, while business units experience service that is perceived to be of a low quality. SLM provides a means to overcome this challenge, providing the many benefits described in 1.2, “Service level management benefits” on page 7. Executive management commitment for SLM is essential since the goal of aligning IT and business requires an organization-wide commitment from both business and IT representatives. It takes hard work and discipline to implement SLM. Simply providing funding is not enough. Executive management can© Copyright IBM Corp. 2004. All rights reserved. 23
  • 41. facilitate commitment during the entire SLM planning and implementation cycle by continually motivating the change and leading by example. This chapter describes a generic approach (Figure 2-1) for implementing SLM after a decision to do so is established. This methodology starts with a planning phase, continues on to implementation, and concludes with on going management and improvement of the overall process. It follows the IT Infrastructure Library (ITIL) process improvement model. Planning Implementation Established decision to implement SLM Develop service level objectives - Describe services - Determine service level indicators - Determine metrics to be used Define key players: Negotiate on service level agreements - Project Sponsor - Review SLOs with business owners - Service Level Manager - Agree on metrics to be used - Project Manager - Agree on reporting requirements - Business Representatives - IT Representatives Implement SLM management tools - Implementing additional monitoring capabilities - Enhance existing monitoring tools if required - Integrate data collected by monitoring - Implement Business Service management tools Understand the services: - Automate service management - Define services - Establish initial perception of the services - Define expected quality of services Establish reporting function - Periodicity - Recipients - Formats Assess ability to deliver: - Analyze existing infrastructure Adjust IT processes to include SLM - Verify existing monitoring capabilities - Service Support processes - Establish baseline for measurement - Service Delivery processes Improvement Process On Going SLM program Improving quality of service levels Maintenance of services definitions Improving efficiency of SLM SLA management via historical reporting Improving effectiveness of SLM Priority management of real-time faultsFigure 2-1 SLM processes implementation approach24 Service Level Management
  • 42. Chapter 1, “Introduction to service level management” on page 3, introduces the four key components of SLM: people, processes, documentation and tools. This chapter identifies and discusses each of these components in more detail.2.1 A look at the ITIL process improvement model An organization may already have some elements of SLM established and operational. Therefore, the approach taken in this chapter to present a method for SLM implementation is one of process improvement. This chapter applies the ITIL process improvement model to an SLM implementation. ITIL process improvement model is summarized by asking the following questions in the order presented: 1. Where do we want to be? This question provides the vision and objectives for an SLM implementation. It is answered by having a clear definition of provided services, determining the current perception of quality of the services being provided, and defining the desired quality of the services to be provided to customers. These topics are addressed in 2.2, “Planning for service level management implementation” on page 26. 2. Where are we now? Perform a thorough assessment of the existing IT infrastructure’s ability to deliver the defined services, and its existing monitoring capabilities. After this task is completed, perform a gap analysis of both the IT infrastructure and the monitoring capabilities so that IT can deliver services with the expected level of quality required by the business and expected by the customers. These topics are also addressed in 2.2, “Planning for service level management implementation” on page 26. 3. How do we get where we want to be? Based on the information gathered from the previous two questions, an IT organization prepares service level objectives (SLOs), constructs SLAs, and negotiates them with customers. This is also the time when additional IT infrastructure, monitoring tools, or both should be put in place. Most importantly, adjustments to existing IT processes to accommodate SLM are performed. These topics are addressed in 2.3, “Implementing service level management” on page 35. 4. How do we know we have arrived? When the implementation is complete, hold review sessions to ensure that all specified goals were met. Also discuss how to resolve unmet goals. Establish quality management for IT services and SLM process improvement programs Chapter 2. General approach for implementing service level management 25
  • 43. at this time. These topics are also addressed in 2.3, “Implementing service level management” on page 35.2.2 Planning for service level managementimplementation This section describes the planning activities that lead to a successful SLM implementation. The desired output items of this phase are: A carefully chosen team capable and committed to implementing SLM This team should include the project manager and service level manager roles to keep deployment participants on track and communicating regularly. A thorough understanding of the services to be managed To accomplish this, collect information from both the business and technical perspectives and then have the service level manager mediate it. Business owners provide an overview of the major functions and an understanding of user demand. The IT service delivery organization provides detailed information about the components that make up the services that support the business functions. Identify current perception of the quality of the identified services and the desired quality level of those services. An assessment of the ability to deliver services based on the expected level of quality This includes an understanding of the current capabilities of the IT infrastructure to deliver services to the quality expected by the business owners. Consider users’ current perception of service levels in this assessment. Based on this assessment, improvements to the IT infrastructure may be required. Define a high-level design that provides an assessment of the existing monitoring capabilities and additional monitoring tools and processes at this time. This forms a baseline for measurement of expected quality of services. To some, all of this preparation may seem time consuming. However, it leads to clearer objectives, which in turn, contributes to project success.2.2.1 Identifying roles and responsibilities SLM requires the participation and support of many different organizations of a business. It is important to clearly define the roles and responsibilities of the people involved and to then identify the specific people to take on these roles. It is also important to involve all team members from the start of the project and to26 Service Level Management
  • 44. facilitate regular deployment checkpoint meetings. This ensures that everyonehas a consistent level of information throughout the deployment.Choosing the correct people is critical. Whoever is chosen must represent theviews of the decision makers from both IT and business organizations and havethe final word on the SLM implementation plan.The SLM deployment team should include people from the areas shown inFigure 2-2. Business Representatives Executive Project Service Level Manager Sponsor Manager IT RepresentativesFigure 2-2 Key representation in an SLM deploymentThe following sections summarize the responsibilities for the key participants.Executive sponsorThe executive sponsor is typically the head of the line of business and isresponsible for delivery of business services to end users. This personunderstands the overall picture of the business process and can state thepurpose of the business. This person has the ultimate go or no-go authority forthe project and the final arbiter for problems and disagreements.Project managerImplementation of SLM is a large scale project and should be treated as one.Appoint a qualified, full-time project manager to work closely with the servicelevel manager and other people involved in the project to incorporate the SLMactivities into a project plan. Chapter 2. General approach for implementing service level management 27
  • 45. Service level manager This is an important role and has the primary responsibility of project ownership. When an SLM project is owned by a service level manager, it is more likely to be effective and successfully produce the benefits that were intended. This person acts as a liaison between the business and IT units, ensuring that IT understands the business requirements and that the business units clearly state them. As such, the person or persons fulfilling this role must have either the appropriate seniority within the organization, or have clear, visible support from upper management from both IT and business organizations. Additional responsibilities for the service level manager include: Creating and owning the SLM people structure within the organization Presenting the plan for SLM to all of the groups involved Describing how SLM will impact each group Describing how each group can contribute to a successful implementation This includes the risks and costs involved. The more complex the plan is, the higher the cost is (more servers, more people hours). Asking each group for support, involvement, and agreement Establishing a regular service level review process with both the customer and the IT provider Negotiating and maintaining the SLAs with the customer Negotiating and maintaining the OLAs with the IT provider Analyzing and reviewing service performance regularly against SLAs and OLAs, leading to adjustments as appropriate Creating and disseminating regular reports on service performance and achievement Coordinating temporary changes to required service levels Business representatives The primary responsibility for this role is to explain the overall and component-wise picture of the business. Business services may include a number of services that require IT support. Therefore, performance of business owners depends on IT performance. Business owners understand their service well but may not understand what comprises an IT service. In large environments, this can be several people, one for each operational unit. A secondary responsibility for this role is to keep the SLM implementation business-oriented.28 Service Level Management
  • 46. IT representatives There are many responsibilities for this role, and they are typically fulfilled by more than one person. The responsibilities include: Providing systems management information such as hardware and operating systems, network infrastructure, application monitoring tools, and so on Describing the IT components of the business service Providing information about the day-to-day operation of the business components Providing feedback from customers to the overall SLM implementation process This is typically the service desk or customer support group with a primary line of communication to the service users. Providing the business impact of problem and change management Taking on the role of technical lead for the tools used in an SLM implementation This group should have or be ready to learn the skills required to deploy the actual tools to be used, as described in 2.3.3, “Implementing service level management tools” on page 38.2.2.2 Understanding the services The purpose of the activities described in this section is to improve the delivery of services to customers. You cannot do this without a clear understanding of what customers want and what they are getting now. This section establishes a high-level definition of the requirements. When understanding the service, the people identified in 2.2.1, “Identifying roles and responsibilities” on page 26, should participate in the activities described in this section. Most of the information comes from the business representatives, who understand what needs to be provided in terms of services to meet the needs of the customers. The information also comes from the IT representatives, who understand what it takes in terms of IT resources to support the business processes. The business representatives provide the functions of the services. The IT representatives provide information about the underlying IT components of the service. The service level manager, who understands both business and technical aspects, is an important participant as well. One way to obtain the required information is to arrange interviews with the right people, to feed back what was said, and check that you understand it correctly before moving on to the next stage. Another way to obtain the information is to Chapter 2. General approach for implementing service level management 29
  • 47. have moderated discussions with multiple people so that information and expectations can be level set among the business and IT participants. Defining services For the purpose of this redbook, a service is defined as a logical grouping of IT systems and applications that together deliver one or more functions to one or more users. From the IT perspective, it is a set of applications that serve a specific business objective with each application comprising of components made of IT resources. From the business perspective, a service is the mapping of IT resources to business processes. According to the ITIL, a service is the IT system or systems that enable customers and users to implement business processes. For more information about the ITIL definition, see the SLM chapter in the ITIL Service Delivery book. This chapter also introduces and encourages the use of a service catalog. Note: It is possible for a service to be made up of other services. For example, online banking can be a service that is made up of services for checking balances, depositing funds, withdrawing funds, and so on. A high-level example definition of a service is as simple as this: My service is online banking. My service is a travel reservation system. My service is a payroll system. To complete the definition of the service, you must now have an understanding of the underlying IT components that make up the service. Typically, a component represents a machine or an application with multiple event sources mapping to it. It is important to know what applications make up the components and how these applications relate to other applications, including dependencies. The following list provides suggestions to assist in defining the business service: Business information – List the functions provided by the service. You may have to speak about applications if the concept of service is unfamiliar. – Describe the relationships between the functions. Provide a schematic that describes how each function is integrated to create the service. The schematic may include a business flow diagram. Technical information – Name the applications or components that deliver the service. – State the purpose of each application or component.30 Service Level Management
  • 48. – Describe the relationships between the applications or components. Provide a schematic that describes how each application is integrated to create the service. The schematic may include a data flow diagram. The relationships may also be described in an architecture document.Table 2-1 provides a useful template for keeping track of components andrelationships between components.Table 2-1 Business service component relationships Business Depends on Impact Comment component examples Application Operating system Application A This application provides server network availability <...> to the business service. Operating Hardware Applications The operating system is system server availability running on an the platform for operating system applications A, B, and C. Network device None VariousEstablishing an initial perception of serviceWhen an SLM process is in place and services that will participate in the processare identified, establish an initial perception of quality of those services and use itas a starting point for improvement through SLM. There are two sides to theperception of services. One side comes from the business owners and is definedin business terms as opposed to technical perception. The other side comesfrom IT service delivery and is likely to be in more technical terms.From the business perspective, examples of initial perception of service may be: The Web site is rarely available in the evenings. Response time is unacceptable. We are losing customers due to bad service.From the IT perspective, the perception of service may be: Servers are available 98% of the time. CPU utilization is at acceptable levels. Existing systems management tools are being under used.As shown in this example, both perceptions are credible to the organization, yetdistinct to each other. Record these perceptions, so that when implementationbegins, you can reference them and choose appropriate metrics formeasurement. Chapter 2. General approach for implementing service level management 31
  • 49. The following list provides suggestions to assist in establishing the initial perception of service: Usage information – Number of users of the service – If applicable, a breakdown of function usage by company employees, business partners, the general public, etc. – Patterns or hours of usage, including peak times – How users access the service (Internet, intranet, extranet, legacy 3270 screens, etc.) The deficient and favorable points of current IT service delivery and how they are communicated to the IT organization The challenges faced by the business, including what is on the horizon by way of new or updated services Current issues with the business service functions Table 2-2 provides a useful template for keeping track of usage information. Table 2-2 Business service usage and perception Feature Time of day Number Method of access Perception of users or type of user TransactionA Morning <num> Intranet Good TransactionB Noon <num> Internet Slow TransactionC Evening <num> <method> Poor TransactionD Midnight <num> <method> Excellent Establishing the expected and desired quality of service At this stage of the planning phase of SLM implementation, the business owners may define the expectation of quality of the services to be provided to customers and users. Expectations to the quality of services can be motivated by several points, for example: Retain the existing customer base and attract new customers. Cultivate customer loyalty. Prove superior service against competition. Expected quality of service also has an IT perspective, which is likely to be: Align the IT organization with the business views. Increase visibility of improvements being done. Maximize potential of systems management tools.32 Service Level Management
  • 50. Record these expectations, so that you can address them during the assessment phase. Depending on the expectations to the quality of services, you can expect changes and improvements to the existing IT infrastructure. Define the desired quality of services objectives that make sense, are measurable, and are achievable. This helps to define the success criteria of the entire SLM implementation.2.2.3 Assessing the ability to deliver After you understand the service, assess the current operational environment by examining the IT infrastructure, and the existing and planned monitoring capabilities. This brings everyone to the same page and establishes a baseline for measurement. When this is completed, you may begin the implementation. While information is collected, keep in mind the initial perception of service and the expected quality of service. The goal is to understand the components that provide the business service. It is also to understand the current IT infrastructure’s capabilities to deliver the services to the expected and desired quality. IT components are at a granular level and should be described in terms of specific applications, servers, and hardware. Management of the service is in terms of monitoring tools and can include specific monitoring thresholds. Earlier this book described the business functions that made up the business service. This section breaks down these functions to help you understand how the IT resources affect them. It looks into the specific applications that are used to provide the function. It also looks at the network, hardware, and operating systems that run the applications. Analyzing the existing infrastructure Insufficient capacity of the IT infrastructure to deliver services often leads to bottlenecks, performance problems, and, loss of availability, all of which contribute to degrading service delivery. Business components were identified in 2.2.2, “Understanding the services” on page 29. Now you must map these business components to IT components and verify the monitoring environment. Since several IT components make up the service, the capacity of each component must be balanced to the capacity of the other components. Capacity management processes must be in place to have a precise evaluation of the capabilities of the IT infrastructure. This is a crucial step toward negotiating SLAs. SLM processes require the assessment of the IT infrastructure capacity needs to accommodate the customer requirements that will be recorded in SLAs. After SLAs are negotiated, SLM processes set the targets for the IT infrastructure to deliver, and capacity Chapter 2. General approach for implementing service level management 33
  • 51. management processes can report on the performance and throughput achievements for SLA evaluation. Assessing the existing monitoring capabilities Review existing monitoring capabilities and upgrade them as necessary. Ideally you must do this ahead of, or in parallel with, the drafting of SLAs, so that monitoring can be in place to assist with the validation of proposed targets. It is essential that monitoring matches the customer’s true perception of the service. Unfortunately this is often difficult to achieve. For example, monitoring individual IT resources, such as a server, does not guarantee that the service will be available to the customer. Without monitoring all IT resources in the end-to-end service, you cannot see a true picture. Monitoring tools collect information about IT resources using predefined measurement metrics. Metrics are the standard of measurement or a measurable quantity, associated with guaranteed service levels to create SLOs. Metrics evaluate performance, availability, or utilization of IT resources, such as transaction response time, CPU, and disk utilization. When implementing SLM, IT should choose the following tools to meet their design specifications: Identify measurement metrics required to measure the IT resources that make up the services. Use monitoring tools to provide the measurement metrics that need to be collected. Use reporting tools that process the data being captured and satisfy all levels of report recipients. Use analytical tools that provide aggregation and analysis of the collected SLM data in a manner that offers fast recognition of business impact and proactive response. Use administration tools that improve the productivity of the SLM operators and users as well as provide the integration of monitoring, reporting, and analytical tools. Compare this list to the existing system management and monitoring tools already in place in the IT infrastructure. In addition, organize the monitoring data collected by such tools and make it accessible to everybody with a stake in the SLM process. Analytics and reporting tools must be able to present this data in a manner that aligns the service views of both IT and their customers, allowing them to reconcile the customers’ perception of service with the service levels delivered by IT.34 Service Level Management
  • 52. IT wants to understand how resource performance and availability affects service levels and what adjustments are needed to improve service. Customers want to make sure that IT delivers availability and responsiveness to the critical applications that they use for automating their business processes. When their business process is impacted, they want IT to accurately report it so they can impose the negotiated penalties on IT. Define a high-level design that provides an assessment of the existing monitoring capabilities as well as additional monitoring tools and processes. This forms a baseline for measurement of expected quality of services. Important: Do not include anything in an SLA unless you can effectively monitor and measure it at a commonly agreed point.2.3 Implementing service level management A successful implementation of the SLM strategy relies on the ongoing communication between an IT organization and business units. SLAs provide business representatives and the IT department with a common language to discuss goals, responsibilities, and management issues relating to IT services. The planning stage produces a high-level design of the proposed SLM solution. It is based on an understanding of user demands and an IT assessment of feasibility to meet customers’ requirements for services. As a result, the implementation stage begins with the detailed design for this solution that defines the SLOs and outlines the solution deployment plan. Based on this high-level design, an IT organization prepares SLOs, constructs SLAs, and negotiates them with users. At the same time, the IT organization begins the implementation of additional tools and makes adjustments to IT processes as required to support new functions.2.3.1 Developing service level objectives An IT organization manages service levels based upon objectives outlined by SLAs. IT drafts SLOs based on business requirements and an IT organization’s assessment of its capabilities. Then it seeks approval from its customers through negotiation. The starting point for SLAs is the business stating what IT services they need for the business to operate effectively. This may include both the minimum acceptable levels and the desirable levels. The IT department has to assess its capabilities to deliver at this level and negotiate with the customers. Chapter 2. General approach for implementing service level management 35
  • 53. Achieving, or even approaching, the desirable level may require additional investment and may need to be addressed by a service improvement program. The negotiation stage is likely to be iterative. SLOs are specifications of a metric that is associated with a guaranteed level of service that is defined in an SLA. The metric by which SLOs are defined, are often called service level indicators (SLIs). From a business perspective, the most important objective is the availability and responsiveness of the service that IT provides to the business. Typically, IT responds to these business requirements by quantifying availability and performance: Availability: The percentage of the evaluation period when service was in an available state Performance: Usually represented by two SLIs such as responsiveness or speed and throughput or volume Additional SLOs may include accuracy (whether the service does what it is supposed to do), cost, security, number of incidents, time-to-repair, etc. SLOs must meet the following criteria before you can include them in SLAs: Attainable: The objective is worthless if IT will never be able to meet it. Measurable: The objective is worthless if it cannot be measured. Understandable: Reported statistics must relate to the user experience. Meaningful: The objective must be relevant to all parties. Controllable: Do not include objectives that cannot be controlled. Affordable: The objective may require additional funding that sponsors are not willing to provide. Additional budget allocation is a business-level decision. Mutually acceptable: One party cannot simple dictate the terms of the agreement. When developing an SLO, an IT organization needs to carefully select measurement metrics that are indicative of this SLO. For example, measuring availability from a user’s perspective is not a simple task. If an application is up and running, it does not mean that users can use it. If IT measures the availability of resources, it does not guarantee that this represents the actual user experience. There is no perfect solution to this problem. Nevertheless an IT organization must use SLIs that can be directly measured. SLAs must document each chosen SLI that will represent each of the SLOs and specify its data source.36 Service Level Management
  • 54. 2.3.2 Negotiating on service level agreements SLOs set up the standards for measurements and determine requirements for monitoring tools. However, before they become a part of an SLA contract, an IT organization must settle with the business units on a mutual understanding of the SLOs and their targets. In the process of negotiating SLAs, an IT organization and its customers exchange information and seek reasonable service level targets. The business units must clearly communicate their requirements and explain the business impact if the proposed service is not acceptable. IT must clearly communicate their assessment of the attainable service levels, the proposed SLOs, and their limitations, as well as explain the costs associated with offering a higher level of service. When these negotiations are completed, IT must document the agreed upon SLOs and SLIs. Other components of the negotiated SLA may include: Term: Typically one to two years Scope: Business description, user locations, transaction volume, service hours Limitations: Transaction throughput, concurrent users, funding, etc. Remedies: Clearly defined penalties for non-performance; defined bonuses for delivering better than expected services Optional services: Current or future at additional cost Exclusions: Clear identification of what is excluded from this SLA Service variations: Different levels at different times, maintenance periods, etc. Reporting: Relevant, well understood list of all reports Administration: Description of ongoing effort and responsibilities Reviews: Validation of SLAs, SLM process, negotiate exceptions every six months Revisions: New SLAs possibly required for technology, workload, staffing, etc. Approvals: Assigned authority to approve changes and new SLAs Chapter 2. General approach for implementing service level management 37
  • 55. 2.3.3 Implementing service level management tools When planing for the SLM implementation, an IT organization performs an analysis of the existing management tools while assessing its capability to provide the measurements as required by the proposed SLAs. Any gaps in management tools must be investigated and further addressed as part of the SLO development and SLA negotiation activities. Chapter 1, “Introduction to service level management” on page 3, introduces tools as one of four components of SLM. When implementing SLM, an IT organization must apply a strategy for the implementation of management tools based on goals for its SLM program, requirements for SLA measurements, IT culture and processes, and the overall benefits and cost of implementation. The effectiveness of the SLM management tools depends on how they are applied and how the right combination differs with each organization. Typically, an IT organization wants to reuse existing tools and add more tools as required. Simply having tools is not enough. They need to be applied correctly, which means they must be integrated into a solution. Typically, SLM uses a combination of traditional primary data collectors that capture data directly from the managed environment and secondary data collectors that extract data from primary data collectors. In addition, SLM needs data from monitoring tools that can simulate user experiences. Implementing service level management monitoring IT organization implements monitoring tools as required to manage the hardware and software components it operates: network management tools, performance management tools, incident management tools, etc. These management tools gather data for a range of purposes, one of which is SLM where focus is on monitoring the state and performance of IT services. We previously defined a service as a set IT resources used in enabling a business process. IT resources can be further grouped into a number of physical domains. Each physical domain is comprised of many subcomponent elements. The following list includes some of the major domains: Servers Network Storage Applications Transactions Databases Desktops38 Service Level Management
  • 56. This simplistic view of IT domains does not account for the fact that each of thesedomains represents a number of different technologies integrated into complexconfigurations that can be managed by a variety of tools. However, when thesedomains are taken together, they control the quality of service. Therefore, it isnecessary to install products for monitoring each domain.From a functional perspective, SLM monitoring of the IT domains should includeevent monitoring, performance monitoring, usage monitoring, securitymonitoring, etc. In our illustration of a generic SLM implementation in thischapter, we do not address the specific monitoring tools. However, the followingchapters demonstrate an example of SLM implementation using IBM Tivoliproducts.The primary challenge before an IT organization, when it initiates the SLMprogram, is the question of which products to install and how to integrate theminto the most suitable SLM solution. After IT completes the planning and the SLAnegotiation phases, it usually has a clear understanding of the tools it needs toimplement to support SLAs. It has already decided to acquire missing tools.When additional products are required, installing, customizing, and integratingthe new products into the existing system management solution can be asignificant part of the SLM implementation effort.Since service can traverse multiple SLM domains, an IT organization must beable to view and evaluate the collected domain monitoring data for eachsupported service. In addition, SLM necessitates monitoring of user experiencesof the delivered service through use of transaction monitors that can generatetransactions and record their execution.Implementing business service management toolsWith the SLM focus on service specific monitoring, an IT organization is forced tochange its approach to organizing the data it collected from monitors. It must nowexpose the relationships of IT components to business process components andaggregate the monitoring data in a way that shows its impact on a company’sbusiness.Chapter 1, “Introduction to service level management” on page 3, introduces thebusiness service management (BSM) approach and the way to incorporate it intoSLM. BSM solutions are designed to improve the effectiveness of SLM through avariety of views, analytics, and automation.The implementation of BSM is a complex project that takes time and resources,but it simplifies and improves the ongoing management of IT events and servicelevels in the context of their impact on business. The topic of BSMimplementation and its role in improving SLM are covered in greater detail in theremaining chapters of this book. Chapter 2. General approach for implementing service level management 39
  • 57. 2.3.4 Establishing a reporting function Service level reporting provides IT with a way to communicate the value and quality of its services. Reports are provided in formats that have been documented by SLAs and, therefore, are well understood by business managers. In addition to reporting service level performance, IT can use these reports to proactively address service difficulties. The reports must be simple and focus on the specific requirements of SLAs. This includes reporting achieved SLOs based on actual values of SLIs. The SLA should include a list of reports that IT intends to use for reporting on SLA compliance. For each report, the SLA should document the content, data sources, service level metrics, distribution, and frequency. In developing reports, an IT organization must categorize recipients based on their area of interest and responsibility. The requirements for each category may differ in perspective, presentation format, frequency, focus, and the granularity of information. IT should tailor reports to the recipient level and report only information that customers can understand. However, IT should also keep the supporting information and make it available when customers request to examine the data more closely. The three major categories of SLA report recipients are: Executive management Executives want to see how IT provides value to their business and how the quality of IT services affects business efficiency (including cost of degraded service in real dollars and lost opportunities). As a consequence, the executive reports must be highly summarized and outline the quality of IT service experienced internally by business units and externally by customers and business partners. In addition, executive management should understand the impact and cost of degraded services. These reports should use graphs and charts to communicate the overall assessment of the achieved service levels and relate their impact on business performance. Any experienced service difficulties should be explained with references to the support documentation as necessary. Business management Business units are interested in understanding how the quality of IT service helps them to achieve their business goals and the impact and cost of degraded service. The service level reports should relate the quality of IT delivered service to the volume of business transactions, staff productivity and customers satisfaction. It is not an easy undertaking. When reporting the40 Service Level Management
  • 58. improved service levels, IT must relate this improvement to increase in business volumes, improved productivity, and better customer satisfaction. The same can be said about service outages and degradation. IT needs to demonstrate their impact on business performance and costs. IT management The service reports that IT distributes to business management should also be reviewed by all levels of IT management. This helps IT managers to understand how component failures and performance degradation affect service levels and impact business performance. In addition, IT management should receive the traditional technology reports that report the outages and performance degradation of resources as well as the response time and volume of application transactions. Using time as a correlation factor for both technology and service level reports, IT managers can gain knowledge regarding how the technology area that they manage affects the overall quality of IT delivered services. In addition to the SLA historical reporting (daily detailed reports, weekly summaries, monthly overviews, quarterly business summaries), an IT organization should implement the real-time alerting and proactive notification of customers and IT staff. It is important for real-time alerting of service outages and degradation to show the components that cause the impact, which business users are affected, and communicate business impact. As explained in Chapter 1, “Introduction to service level management” on page 3, BSM is well suited to perform this function.2.3.5 Adjusting IT processes to include service level management When planning for the SLM implementation, an IT organization must review its management processes and identify any adjustments needed to satisfy the requirements of its new mission. This provides an opportunity for IT to improve its responsiveness to business considerations as well as to improve its operation. Using the business knowledge it acquired during the SLM planning stage, IT can become more proactive in managing resources and establish priorities for its fault management process. As IT implements new monitoring and management tools, it needs to revise the operational procedures and documentation, staff new functions, and train operation personnel. In addition, IT should use the SLM rollout as an opportunity to improve the existing management practices in the following areas. Chapter 2. General approach for implementing service level management 41
  • 59. Event management BSM provides facilities that allow consolidation of all enterprise events and provide a single point for event management based on business priorities. This increases the value and productivity of the IT operation and service desk personnel. It also prompts IT to establish a control center function that will be responsible for managing events. Important: There are some key benefits of well implemented event management processes. For example, IT management and business executives can evaluate the immediate business impact of IT events and understand how they affect SLA compliance. IT operations can prioritize fault management. Availability management SLM facilitates the transition from management of IT components to management of IT services and changes the metrics for measuring availability. When the underlying IT resources experience problems or become unavailable, the service may still perform satisfactory if resources are duplicated. The focus of BSM on service state management significantly improves the understanding of services. It offers more robust capabilities to determine service states based on rules governing the impact of events received by the underlying resources. Important: When managing availability, an IT organization must focus on identifying critical events for each service that by definition impact this service availability. IT operations can significantly improve the availability of IT services through the proactive management of critical events. Capacity management Monitoring the performance of IT physical domains, defined in 2.3.3, “Implementing service level management tools” on page 38, is a well established discipline in the majority of IT organizations. When implementing SLM, an IT organization requires additional aggregations of collected performance information to meet SLA obligations for reporting on the service level performance. Important: With BSM facilitating the mapping of resource-to-service relationships, an IT organization can improve its performance management processes by prioritizing the management of IT resources based on their business value. This approach also applies to proactively planning for additional capacity when service levels are in danger.42 Service Level Management
  • 60. Change managementAn IT organization uses the change management process to evaluate the impactof requested changes and, therefore, to reduce risk of pending requests. BothSLM and BSM can significantly boost the effectiveness of any changemanagement process by supplying the criteria for risk evaluation, provided bySLAs, and facilitating impact visualization provided by BSM. Important: An IT organization must adjust its change management process to evaluate implications of the requested changes on agreed service levels and understand their business impact.Incident managementSome SLAs include SLOs for measuring service desk responsiveness and IThandling of faults. Service levels may include a time value for problemescalations and a mean-time-to repair value. Every IT organization has somevariation of an incident reporting system and escalation procedures.BSM improves event management and incident recording. It provides capabilitiesfor a proactive management of resources in need of repair. It often offers abidirectional interface to a number of help desk solutions. Business focus of SLMand BSM enables an IT organization to improve its incident managementprocess through timely recognition of faults, better understanding of their impact,and added value of SLA reporting. Important: When implementing SLM, IT needs to integrate its manual processes and the help desk solution it uses for incident management with SLAs and BSM.Cost managementSLM uses SLAs as a mechanism for governing use of IT resources to ensure thatIT services are performing according to the SLA specifications. Customersbecome aware of cost implications while negotiating SLAs.An IT organization must balance service cost with service delivery. As theservice provider, IT should use service pricing as the mechanism for accountingfor resource usage by business units. However, both resource accounting andservices charges become a contentious issue between IT and business units. Important: When implemented, both SLM and BSM should have input into the cost management process. This enables an IT organization to establish the regulation of resource use based on business value and improve communication with business units when applying charges for services. Chapter 2. General approach for implementing service level management 43
  • 61. Application support Many enterprises have centralized all application development activities and infrastructure management activities under one IT organization. The scenarios in Part 2, “Case study scenarios” on page 195, use this model. IT development organizations typically develop and support such applications. Application support staff work for IT development management and interface with both business and IT support departments. For this reason, application support people can greatly contribute to SLA development, while greatly benefitting from the SLM and BSM implementation. Application support staff typically are well aware of the business process that IT is automating with its applications. The development organization often possesses the knowledge of service parameters such as the number of expected users, the expected response time, etc. In addition, the development organization may provide its own instrumentation to assist in managing performance of the applications that it implemented in support of business. However, application support staff often lacks the knowledge of IT infrastructure and rely on IT support and operation staff when researching user problems. Important: Application support people must be included in both the planning and implementation of the SLM and BSM programs. They should be involved in the design of service compositions for both SLM and BSM and should provide further input during their ongoing application support activities.2.4 Ongoing service level management program The SLM implementation program has supplied documentation, management tools, and SLOs to measure against. An IT organization has also completed review of its processes, identified the required adjustments, and established management reporting in support of SLAs. Now, the success of the SLM implementation hinges on the ongoing program of reporting, management, and improvements that aim to establish more trust between an IT organization and business units. SLAs provide a vehicle for communications and an instrument for management. IT must use both proactively in the ongoing effort to satisfy the SLM objectives through the following program of: Maintenance of service definitions SLA management via historical reporting Priority management of real-time faults44 Service Level Management
  • 62. 2.4.1 Maintenance of service definitions As mentioned earlier, while planning for SLM, an IT organization must decompose business processes into IT services. Through interviews, IT obtains the required knowledge and uses it to define services by creating business views of IT resources. The SLM planning stage provides definitions of services and identifies the IT resource associations for each service. The initial business views of IT resources are created during the SLM implementation stage manually or automatically. Note: It is critical to accurately represent business use of IT resources in IT environments where the IT resource configurations and workloads change rapidly. An IT organization must address this issue through automatic discovery of dynamic changes in business-to-resource relationships based on policy rules. Business views are an important IT asset that must be protected and continuously updated. An IT organization must allocate resources to administer and continuously refine business views. This effort may vary depending on the SLM scope, tools, and the implementation strategy. Follow these few recommendations for ongoing management of business views of IT resources: Implement in phases. Begin simple and expand. Refine as necessary. Visualize the obtained knowledge of IT physical resources and their dependencies. Visualize the obtained knowledge of business process components. Construct business views by mapping business process components and IT resources. While defining a business view, consider only IT resources that are important for this business view. While defining a business view, always understand what it is for and who is going to use it. With the right tools, an IT organization can significantly improve the productivity of administering business views and their value for both IT and business units. BSM tools are designed to facilitate the creation and ongoing maintenance of business views as well as the rule-based dynamic mapping and management of relationships. Chapter 4, “Planning to implement service level management using Tivoli products” on page 109, addresses the use of business views in IBM Tivoli products in greater detail. Chapter 2. General approach for implementing service level management 45
  • 63. The ongoing administration of business views includes the following activities: Adding new business views upon requests from the IT change management team Adjusting business views upon addition of new resources Deleting business views that are no longer needed Ongoing maintenance of business views2.4.2 Service level agreement management via historical reporting Manual processes for producing SLA reports are labor intensive, time consuming, and prone to error so most organization want to automate SLA reporting. They do this by using custom reporting applications, but these are expensive to build and maintain. The best solution is to use off-the-shelf tools that can be configured to gather the required information and produce SLA reports automatically. When negotiated, deploy SLAs for continuous monitoring and reporting. During the SLM implementation stage, an IT organization deploys monitoring tools that collect the negotiated measurement data from all IBM Tivoli Monitoring components that are covered by SLAs. When deployed, monitor and report on SLAs in a timely fashion. The SLA terms include the time and frequency of reporting (for example within five business days of the first of each month, the end of each month, etc). Reporting metrics include daily or hourly summaries depending on the collection cycle. SLA management relies on data deriving from multiple sources. This can either be collated via customized procedures (which are difficult and expensive to produce and maintain) or collected centrally with a mechanism such as the Tivoli Data Warehouse as discussed in Chapter 3, “IBM Tivoli products that assist in service level management” on page 53. The goal of the SLA management is to report the status of services and their compliance to SLA agreements. Frequency of reporting may vary with the organization and user perception of the current service. Here are a few examples of reporting requirements: Both business and IT executives may want to review their set of reports at least once a month. Business executives may want to be notified every time that the service level for their SLAs is breached. An IT director may want to be copied on all notifications to business executives and receive notifications of any trends toward violation within some future period (usually the next 24 or 48 hours).46 Service Level Management
  • 64. Without automation, ongoing SLA management often fails to deliver the intended value despite of the well planned and well executed implementation. It is unacceptable for business executives when an IT organization takes several weeks to consolidate technical reports into a combined view of service.2.4.3 Priority management of real-time faults In the process of planning and implementing SLM, an IT organization defines services that it provides to automate business processes and documents the objectives for SLM in the SLAs contracts. According to the ITIL, SLM is the continuous process of measuring, reporting, and improving the quality of services but not specifically addressing the management part. You can assume that ITIL’s focus is on the traditional management cycle through historical reporting and reviews for managing SLAs that we addressed in 2.2.2, “Understanding the services” on page 29. Service definitions provide alignment of IT resources and business processes that they support, enabling management of IT resources based on their business value. The status of IT resources changes dynamically as they change state and receive normal and abnormal events. The ability of IT operations to handle the resolution of abnormal events (faults) hinges on the knowledge of their impact on business processes. Through understanding business value of IT resources, IT operations can manage real-time faults based on business priorities. SLM state management should consider several factors before deciding the final state of each service, such as state and priority of the service components, importance of events and number of occurrences, recovery from faults through resource pooling, scheduled outage due maintenance, components being repaired, and so on. An improvement in fault management by operations has a direct impact on service levels that are measured by the following SLIs: Service availability: Better definition of availability and more granular measurement improve quality of service levels. Component repair time: Faster recognition of problems and better understanding of their impact allow accelerated repairs and improved IT performance. Service desk responsiveness: Better understanding of faults, their priority, and impact allow better communication with users and improve their satisfaction. Chapter 2. General approach for implementing service level management 47
  • 65. Cost of support: Better understanding of faults, their priority, and impact can significantly increase productivity of control center personnel and IT support staff. Fault management by business priorities also improves quality of IT operations, increases productivity of root cause analysis, and provides more visibility of IT value. Ongoing management for the effective priority management of real-time faults is not practical without BSM tools. The remaining chapters of this book provide detailed examples of priority management of real-time events by IBM Tivoli products.2.5 Continuous improvement A central theme for the service level manager is continuous improvement of the implemented SLM processes. The improvement process for SLM must reflect the fact that business and IT requirements change constantly, users expectations tend to rise over time, and quality improvement must be proactive rather than reactive.2.5.1 Improving quality of service levels The process of improving service levels begins by reviewing the deployment. It is followed by a continuous tuning effort and the periodic adjustment of SLAs to reflect business and IT changes. Deployment review session The planning and installation team must review the completeness and accuracy of service levels. The team must analyze the problems that impacted service levels but were not captured by tools. It must also adjust service definitions and measurement thresholds and investigate the need for additional monitors. Ongoing improvement through tuning An IT organization is likely to implement an ongoing effort to tune its definitions of services, measurement metrics, metrics data collection, automation policies, and performance of IT resources. In addition, IT can initiate a service level improvement program that is a more formal project to implement improvement actions derived from periodic reviews. The initial rollout of SLM often includes a few important but simple SLAs. This is followed by a continuous expansion of SLAs, which in turn results in new requirements for service definitions, measurement metrics, and monitoring tools.48 Service Level Management
  • 66. IT management should work with business executives to immediately address any issues of user distrust of the reported service levels and use these issues as an opportunity for additional tuning. Periodic reviews of service levels Based on the ITIL definition, the ongoing service level improvement process includes periodic reviews of service achievements and maintenance of SLAs. The service level manager is responsible for facilitating this effort. Analyze the results of ongoing monitoring and reporting service levels and periodically review them with customers. This is the appropriate time to discuss the service achievements and trends, issues of service perception, as well as opportunities for improvement. Also review the existing SLAs periodically for service completeness and accuracy, as well as the relevance of targeted measurements and objectives.2.5.2 Improving efficiency of service level management SLM interacts with other IT processes while providing business-oriented service. For more information, see Chapter 1, “Introduction to service level management” on page 3. The efficiency of SLM is determined by the level of its integration with other IT processes (including tools and skills) and the maturity of its program. A natural maturation process of an IT organization that initiated SLM program involves the following stages: Evolution of monitoring (from component based to end-user experience based and then to service based) Management of service levels to reduce user impact of service degradations Proactive fault management based on business value Control service in an automated fashion to proactively detect and correct problems Proactive prediction of future business requirements and the associated resources that are e necessary to support business with the appropriate levels of service Integration of service management tools to enable IT users to decompose their business processes, automatically discover all supporting IT components, and review the quality of delivered service Chapter 2. General approach for implementing service level management 49
  • 67. 2.5.3 Improving effectiveness of service level management For IT, taking a proactive approach is the best way to improve the effectiveness of its SLM program. An IT organization must recognize the fact that user expectations and business requirements will continue to increase over time. Another important factor for a proactive approach to SLM is that IT can sustain, rather than repair, service levels, so that: External customer revenue, cost-savings, customer satisfaction (corporate image) can be sustained. IT can be more efficient and plan problem fixes in a controlled and orderly fashion based on business needs rather than react to the next or what appears to be the biggest problem. Customers and internal clients are more loyal. SLA penalties are reduced. Proactive improvement of service level management process After SLAs are in place, the SLM process acquires the service levels to strive for. However, simply reacting to problems and reporting the achieved service levels is the wrong approach. Only proactive improvement can guarantee continuous achievement of service levels. SLM includes the proactive development of the right policies, procedures, organizational structures, and personnel skills to improve service level quality and to ensure that business processes are not affected by any service difficulties. Continuous improvement of the SLM process must focus on improving relationships with users while adding value to business processes through IT services. Every component of SLM must be examined regularly for improvement opportunity, and any improvement must be proactively communicated to users. It is the responsibility of the service level manager to ensure that corrective actions are proactively developed and executed for all identified improvements. The service level manager plays the central role in facilitating improvement for all aspects of SLM operation. Activities include improving understanding of business processes, improving and calibrating SLAs, driving improvements in technology and operations, and improving communications with users. Through a proactive approach to SLM, an IT organization can increase its credibility and receive more cooperation from business units. Proactive response to business changes Every service level manager must proactively seek information from users about pending changes in the existing business processes and communicate this information to IT management, so it can adjust IT resources as needed.50 Service Level Management
  • 68. IT must investigate any deviations in the existing service levels. If it finds thatservice violations resulted from changes in business volumes or user behavior,IT must proactively communicate its findings to business units and renegotiateservice levels as necessary.IT must also integrate the rollout of new business applications with its changemanagement process and generate change requests for new service definitionsand SLOs before deploying these applications in production.Proactive management of service levelsChange is a constant factor in both business and IT environments. Maintaining ahigh quality of service requires a significant effort from any IT organization. Itmust anticipate the impact of changes while proactively improving itsmanagement of the existing SLAs, regulating resources, and managing userexpectations.Earlier this chapter addressed the service level improvement activities such asthe ongoing tuning, the periodic reviews, and the service improvement program.The focus of this proactive effort is to ensure the most effective management ofthe existing SLAs to meet and even exceed the negotiated service levels.Another aspect that contributes to the improvement of service levels involves theoptimization of services, regulation of resources, fault management,performance tuning, etc. When executed proactively, these operational activitiesallow IT to maximize resource use in support of SLAs and improve service levels.Improvement in service levels may lead to increased user expectations ofservice. A proactive approach to service level improvements allows an ITorganization to market its achievements in maximizing the service levels that canbe attained at current costs, and manage user expectations.Proactive integration of tools and processesSLM allows an IT organization to integrate a number of ITIL processes whileapplying business knowledge to managing IT infrastructure. Appendix A,“Service management and the ITIL” on page 447, describes servicemanagement in great detail. The ITIL processes and the tools to support themcontinue to evolve. Most companies still have significant integration issues withavailable commercial products while trying to use these products for SLM.IT must proactively research new technologies and enhance its practices basedon the experience of others. IT organization should always look for new solutionsthat provide better alignment between the IT organization and business units thatare more suitable for SLM. These solutions must provide more intelligentanalytics, a broader scope of data sources, and visualization of business and ITcomponents and their relationships. Chapter 2. General approach for implementing service level management 51
  • 69. Most management solutions today typically require a significant customization. Integrating them with IT processes to provide SLM is a difficult and laborious effort. Chapter 1, “Introduction to service level management” on page 3, introduces a business-oriented approach for managing IT services or BSM and the value of its integration with SLM. A proactive approach of process and tools integration around a single set of service definitions can significantly improve the efficiency and the effectiveness of any SLM program. The remainder of this book demonstrates, via detailed examples and case studies, an SLM solution design that involves monitoring IT resources, monitoring of user experiences, event correlation as well as BSM automation, analytics, and reporting. Two test cases describe the integration of eight Tivoli products in support of two different SLM initiatives.52 Service Level Management
  • 70. 3 Chapter 3. IBM Tivoli products that assist in service level management Chapter 2, “General approach for implementing service level management” on page 23, provides a generic approach to implementing service level management (SLM) processes. This chapter describes the key IBM Tivoli products used to implement them. It includes high level descriptions of the following products and how they integrate to provide an SLM solution: IBM Tivoli Business Systems Manager V3.1 IBM Tivoli Service Level Advisor V2.1 Tivoli Data Warehouse V1.2 IBM Tivoli Monitoring for Transaction Performance V5.3 IBM Tivoli Enterprise Console V3.9 IBM Tivoli Monitoring V5.2© Copyright IBM Corp. 2004. All rights reserved. 53
  • 71. 3.1 IBM Tivoli product mapping Figure 3-1 shows a high-level representation of the IBM Tivoli products that can help to implement SLM. This chapter considers the two layers of components and describes the products that fit into each layer. The layers are: Monitoring and measurement metrics Service level management Service Level Management Real Time Management Predictive Management - IBM Tivoli Service Level Advisor - IBM Tivoli Business Systems Manager - Tivoli Data Warehouse Monitoring and Measurement Metrics Availability Performance Monitor Systems and Applications / User Experience Event Correlation and Automation - IBM Tivoli Monitoring for transaction Performance - IBM Tivoli Enterprise Console - IBM Tivoli Monitoring - IBM Tivoli Monitoring for Transaction Performance - IBM Tivoli Monitoring for Databases - IBM Tivoli NetView - IBM Tivoli Monitoring for Business Integration - IBM Tivoli Monitoring for Web Infrastructure Figure 3-1 Product mapping3.1.1 The monitoring and measurement layer The IBM Tivoli products in this layer monitor and measure the behavior of the IT infrastructure. They address two aspects of systems management: Availability management This includes products that monitor software and system resources to determine their availability. These products also provide functionality for event correlation across multiple platforms; assistance with determining the root cause of problems based on information gathered from multiple sources; automatic correction of problems; and automatic notification of support personnel.54 Service Level Management
  • 72. The IBM products directly relevant to SLM are: – IBM Tivoli NetView® Family – IBM Tivoli Enterprise™ Console – IBM Tivoli Monitoring for Transaction Performance Performance management This includes products that measure the internal performance of systems and applications. They also provide information about the experience of end- users. The functionality includes continuous monitoring and recording of information, raising alerts when thresholds are exceeded, and gauging user experience by making response time measurements and running synthetic transactions. These products can monitor hardware databases and applications. The IBM products directly relevant to SLM are: – IBM Tivoli Monitoring for Transaction Performance – IBM Tivoli Monitoring – IBM Tivoli Monitoring for Database – IBM Tivoli Monitoring for Business Integration – IBM Tivoli Monitoring for Web Infrastructure3.1.2 The service level management layer This layer contains components to enable organizations to closely align IT with business goals, meet service level commitments, ensure peak business service performance, and reduce support and licensing costs. They also help customers to focus limited resources on the most important areas of the business. The products in this layer address two aspects of systems management: Real-time management This includes products to evaluate the health of business functions in near-real time to alert operational personnel of service failures or degradation. The relevant product in this group is IBM Tivoli Business Systems Manager. Predictive management This includes products to collect performance and availability metrics and compare them with service level objectives (SLO). The relevant products are: – IBM Tivoli Service Level Advisor – Tivoli Data Warehouse Chapter 3. IBM Tivoli products that assist in service level management 55
  • 73. 3.2 IBM Tivoli Business Systems Manager IBM Tivoli Business Systems Manager is part of the IBM’s business service management (BSM) portfolio of products that provides intelligent management software to enable businesses to optimize their operational agility. For more information about IBM Tivoli Business Systems Manager, refer to IBM Tivoli Business Systems Manager Getting Started Guide, SC32-90883.2.1 Business goals Typical business goals addressed by IBM Tivoli Business Systems Manager are: Aligning IT operations with business priorities to maximize business value Optimizing IT resources to help manage costs Maximizing efficiency to drive productivity and revenue Optimizing service availability to achieve enhanced customer satisfaction3.2.2 High level description and main functions IBM Tivoli Business Systems Manager is a near real-time, event-driven systems management product. It can manage and monitor systems, applications, middleware and other related systems management components in a business context. Traditional systems management tools focus on technology and deliver only fragmented views of the health of the enterprise infrastructure. IBM Tivoli Business Systems Manager works in conjunction with IBM and third-party systems management tools to analyze the impact of faults and outages on business services. IBM Tivoli Business Systems Manager provides your operations technicians with a view of IT infrastructure components as they relate to your overall business. It also provides your executives with a high level view of the status of critical services in your organization. Main functions The main functions of IBM Tivoli Business Systems Manager are: Console consolidation IBM Tivoli Business Systems Manager provides a consolidated view of systems management information derived from a wide range of existing IT management solutions and IT platforms. In doing so, it enables you to maintain the value of existing tools while reducing complexity. For a full list of supported platforms and systems management tools, see IBM Tivoli Business Systems Manager Getting Started Guide, SC32-9088. This list includes:56 Service Level Management
  • 74. – Distributed systems products • IBM Tivoli Enterprise Console® 3.7.1 or later • IBM Tivoli NetView Version 7.1 or later • IBM Tivoli Monitoring Version 5.1 or later • IBM Tivoli Monitoring for Database, Application, Business Integration, Web Infrastructure, and Collaboration • IBM Tivoli Monitoring for Transaction Performance Version 5.1 or later • BMC Patrol Version 3.4 • Computer Associates Unicenter TNG Versions 2.1, 8 2.2, and 2.4 • NetIQ AppManager Server Version 4.02 • Hewlett-Packard Openview Network Node Manager for Solaris and HP/UX– z/OS products • IBM Tivoli System Automation for z/OS Version 2.3 • IBM Tivoli NetView for z/OS Version 5.1 • IBM Tivoli Workload Scheduler for z/OS Version 8.1 or later • IBM Tivoli OMEGAMON® products • Various third-party schedulers and other systems management products from BMC, Computer Associates and Allen Systems GroupMonitoring from a business services perspectiveIBM Tivoli Business Systems Manager provides monitoring capability for acomplex combination of system resources across multiple platforms. As aresult, it provides views that reflect the business services being providedacross the enterprise.Executive awareness of service statusBy providing executive dashboards that reflect the status of businessservices, IBM Tivoli Business Systems Manager provides executives in yourorganization with a clear and simple view of the status of their key businessservices.Impact analysis and critical path managementIBM Tivoli Business Systems Manager provides views that clearly show theimpact of faults in the infrastructure on business services. In doing so, itfacilitates prioritization of fault resolution effort based on business impact. Italso helps with the identification of single points of failure.Root cause analysisThe various views and reports available in IBM Tivoli Business SystemsManager can be used to assist the process of root cause analysis. TheBusiness Impact view shows resources that are affected by a fault and theirrelation to the resource with the fault. Also the Event View displays the eventsthat triggered the resource state change. Chapter 3. IBM Tivoli products that assist in service level management 57
  • 75. Reporting IBM Tivoli Business Systems Manager provides standard reports out of the box. It also provides a process to export systems management data to the Tivoli Data Warehouse for analysis. Basing service level agreements (SLAs) on business services The close coupling of IBM Tivoli Business Systems Manager with Tivoli Data Warehouse and IBM Tivoli Service Level Advisor enables construction of SLAs based on the availability of business systems using out-of-the-box interfaces. Visibility of SLA breaches and trends The Tivoli Data Warehouse and IBM Tivoli Service Level Advisor interfaces also enables SLA breaches and trends to be made visible in executive dashboard views. Resource discovery IBM Tivoli Business Systems Manager includes several tools to assist in discovery of resources present in an enterprise to reduce implementation time and costs. See “Resource discovery” on page 61.3.2.3 Benefits of using IBM Tivoli Business Systems Manager Table 3-1 summarizes the advantages and business benefits of using the key features of Tivoli Business Systems Manager.Table 3-1 Benefits and advantages of Tivoli Business Systems Manager features Features Advantages Benefits Provides business context for Allows IT staff to view IT resources in Provides a business context IT, enables greater the context of critical business for IT; enables greater accountability to business user services and prioritize actions based accountability to business needs, and improves ability to on business impact and make user needs; improves ability prioritize and optimize intelligent trade-offs to prioritize and optimize Shows the relationship between Allows IT staff to make intelligent Increases availability applications trade-offs, to easily spot inefficiencies (uptime) of critical business and problems, and to quickly systems diagnose the root cause of complex failure scenarios Automatically discovers and Allows for the placement of Speeds implementation time; builds graphical views of discovered resources into containers reduces errors; ensures applications that represent critical business currency and accuracy of systems and applications management view58 Service Level Management
  • 76. Features Advantages BenefitsDynamically adjusts the Automatically keeps the business Reduces errors and improvesbusiness system view for system view up-to-date by avoiding productivitycomponents added, modified, the problem of manual entry leadingor deleted to obsolete information displays3.2.4 Key concepts in IBM Tivoli Business Systems Manager To understand Tivoli Business Systems Manager, you must be familiar with the following concepts: Business systems Business system views Work spaces Resource discovery Event processing and propagation Business systems Imagine a Web-based insurance application. The infrastructure for the service may consist of a set of applications running on UNIX and Microsoft® Windows® 2000 servers. Some may be outside the company intranet and others behind firewalls, legacy mainframe database systems, miscellaneous load balancers and other network devices, and diverse other components. Together they deliver the service that customers know as Online Insurance. A IBM Tivoli Business Systems Manager business system is a logical container or folder that is populated with resources representing IT components. In this example, IBM Tivoli Business Systems Manager represents Online Insurance as a business system that contains icons that represent the resources that deliver the service. Business systems can be created manually from the console, automatically by giving IBM Tivoli Business Systems Manager a set of rules, or via Extensible Markup Language (XML) files. For full details, see Chapter 4, “Planning to implement service level management using Tivoli products” on page 109. There are three aspects of a business system: Resources: The group of resources that provide the business function Relationships: The hierarchical relationship between the resources Propagation rules: The method of dealing with events that affect the resources Chapter 3. IBM Tivoli products that assist in service level management 59
  • 77. Business systems may be built for different purposes, for example: Service based: A business system that contains a set of applications and other resources that support a service such as internet banking Department based: A business system that contains all resources supporting the accounting department Technology based: A business system that contains all UNIX servers in the enterprise Geographically based: A business system that contains all applications for the Europe, Middle East, Africa (EMEA) region Business system views IBM Tivoli Business Systems Manager displays business systems in business system views. These are used to monitor the availability of resources and the service as a whole. They also helps to visualize the hierarchical relationships between the components. There are several types of business system views for different purposes. They represent the information about business systems in different ways. Tree view: Displays resources in a tree format Hyperview: Displays resources in an navigable elliptical view with a selected resource as the launch point You can use this view to quickly navigate complex business systems using the mouse. Table view: Displays resources in a table and provides sorting and filtering options Topology view: Displays representations of the relationships between resources IBM Tivoli Business Systems Manager can provide users with views appropriate to their responsibilities. It is a simple matter to configure one view for a specific user, such as the manager of the Web services group, and a different one for a group of users, such as the internet banking support team.60 Service Level Management
  • 78. Work spacesThe IBM Tivoli Business Systems Manager systems administrator can designdifferent work spaces for users. The workspace setup determines what individualusers will see when they log on.The systems administrator must design work spaces carefully to reflect the rolesof the people using them. They must also focus the attention of support staff onthe most important business services. A help desk may need a work space thatincludes a business system view based on the physical organization of systemsand applications. But a CIO may want a work space that shows all the businessprocesses in the enterprise, at a lower level of detail than the help desk.Resource discoveryBefore IBM Tivoli Business Systems Manager can monitor a resource, it must beaware of its existence, understand what type of resource it is, and know where itbelongs in the enterprise. Even a medium-sized enterprise contains too manyresources to record manually, so IBM Tivoli Business Systems Manager providesseveral mechanisms for discovering resources: Bulk discovery: This runs as a batch job on z/OS systems. It also sends information about discovered resources to the IBM Tivoli Business Systems Manager database where Load/Discover scheduled jobs are run to complete the processing. A similar bulk discovery process is provided for Tivoli Workload Scheduler for z/OS, and for distributed systems resources instrumented with monitors. They communicate through the IBM Tivoli Business Systems Manager common listener interface, including IBM Tivoli NetView and CA Unicenter TNG. Rediscovery: This is similar to bulk discovery, except that resources already in the database are ignored. It is essentially a delta discovery. Auto discovery: When enabled, this process automatically discovers certain types of resources, including DB2®, IMS™, and CICSPlex® resources. Similar script-driven processes are available to drive delta discoveries for resources instrumented though the common listener interface and the set of IBM Tivoli Monitoring products. Discovery by event: This process discovers resources that were not previously identified from messages and exceptions sent to IBM Tivoli Business Systems Manager. If an event is received for an unknown resource, the discovery process creates the resource and posts the event to it. Chapter 3. IBM Tivoli products that assist in service level management 61
  • 79. Event processing and propagation Chapter 4, “Planning to implement service level management using Tivoli products” on page 109, describes how IBM Tivoli Business Systems Manager processes events in detail. Events are sent to IBM Tivoli Business Systems Manager from both z/OS and distributed systems environments: z/OS events are forwarded to IBM Tivoli Business Systems Manager via the Source/390 address space on the z/OS machines. Distributed systems events are passed to IBM Tivoli Business Systems Manager via the Tivoli Enterprise Console or common listener interface. When an event is forwarded to IBM Tivoli Business Systems Manager, it is associated with the resource representing the object in the real-world that gave rise to it, for example a CICS® transaction. The resource is included in one or more business systems that form a hierarchy of folders representing services. The IBM Tivoli Business Systems Manager propagation engine then examines the priority of the event and compares it with the tolerance rates set for the resource. If the tolerance rate is exceeded, the propagation engine takes escalation action by sending a further event (called a child event) to the parent objects in the hierarchy. This process continues iteratively until all escalation steps are considered. This process is called event propagation. It is the key component of IBM Tivoli Business Systems Manager’s ability to assess the business impact of events.3.2.5 IBM Tivoli Business Systems Manager architecture Figure 3-2 shows a simplified architecture diagram for Tivoli Business Systems Manager. For more information, see IBM Tivoli Business Systems Manager Getting Started Guide, SC32-9088.62 Service Level Management
  • 80. zOS Tivoli Data Tivoli NetView Warehouse Source/390 for zOS TBSM Servers Host Integration Event Handler History Server Server Server Web Console Propagation Console Web Console Server Database Server Server Server Console Agent Common Listener Health Monitor Listener Service Server Health Monitor Client Tivoli Management Region Distributed Data TEC Task Server Source. Event Enablement ( Netview, ITM)Figure 3-2 Tivoli Business Systems Manager flowchart IBM Tivoli Business Systems Manager servers IBM Tivoli Business Systems Manager is implemented on a set of Intel® servers running Windows 2003 or Windows 2000. The exact number of physical servers required depends on the size and type of enterprise being managed. IBM Tivoli Business Systems Manager Installation and Configuration Guide, SC32-9089, provides guidance on hardware and software prerequisites and physical placement of the following logical servers: Database server: This is based on the Microsoft SQL Server and hosts the IBM Tivoli Business Systems Manager data repository. History server: Actions and events from IBM Tivoli Business Systems Manager are regularly archived to this server for reporting and auditing purposes. Using a separate server for reporting improves the performance of the main database server and speeds up production of reports. Chapter 3. IBM Tivoli products that assist in service level management 63
  • 81. Console server: This supports IBM Tivoli Business Systems Manager Clients using the Java™ console. Propagation server: This performs impact analysis on events received by IBM Tivoli Business Systems Manager to determine what business systems are affected. Events are propagated to higher level business system objects in accordance with the business system hierarchy and propagation rules. Event handler server: This processes events coming to IBM Tivoli Business Systems Manager from z/OS environments if these are being managed. Host integration server: This is required if IBM Tivoli Business Systems Manager is to process events from z/OS machines that do not have TCP/IP communications protocol installed. It handles Systems Network Architecture (SNA)-based communications used on legacy systems. In practice, most client implementations of Tivoli Business Systems Manager do not require this service. Web Console application server: This supports clients accessing IBM Tivoli Business Systems Manager with a Web browser-based console. The Web console provides many of the views available to users of the Java console and is suitable for many types of users. Health monitor server: This monitors the health and availability of the other IBM Tivoli Business Systems Manager servers and their related components.3.3 IBM Tivoli Data Warehouse Tivoli Data Warehouse provides a central repository in which you can store data about your IT infrastructure, including network devices and connections, desktops, hardware, software, events, and other information. Stored data is subsequently analyzed and used to produce reports about the behavior of IT components and services. Important: Tivoli Data Warehouse is not an independent product. It is delivered free with all Tivoli Data Warehouse-enabled applications. All enabled Tivoli source applications are shipped with the necessary Tivoli Data Warehouse components to import their data into the central data warehouse. For more information about Tivoli Data Warehouse, refer to Introduction to Tivoli Data Warehouse, SG24-6607.64 Service Level Management
  • 82. 3.3.1 Business goals Typical business goals addressed by Tivoli Data Warehouse are to: Provide a cost-effective means of storing systems management information Provide a basis for analyzing the IT infrastructure to achieve the best business value Provide a basis for SLA reporting3.3.2 High level description and main functions Using Tivoli Data Warehouse, you can store, in one place, data about your IT infrastructure, including network devices and connections, desktops, hardware, software, events, and other information. Depending on the data stored, you can analyze your IT costs, performance, and other trends across your enterprise. You can also show the value and return on investment (ROI) of Tivoli and IBM software. And you can use it to identify areas where you can be more effective. Moving data from operational data stores into a data warehouse keeps them running efficiently while preserving historical data for analysis over longer periods of time. Tivoli Data Warehouse comes with database optimizations for the efficient storage of large amounts of historical data and fast access to data for analysis and report generation, and the infrastructure and tools necessary for maintaining the data in the warehouse. Tools include the Tivoli Data Warehouse application, IBM DB2 Universal Database™ Enterprise Edition, IBM DB2 Data Warehouse Center, and IBM DB2 Warehouse Manager. Tivoli Data Warehouse uses an open architecture to store, aggregate, and correlate historical data. This enables you to include data from your own applications and third-party systems management products as well as data from IBM Tivoli products. If your enterprise supports multiple customers, you can keep the data in a single data warehouse, but restrict access rights so that customers can see and work with only their own data and reports. You can also restrict access rights at the level of an individual. Crystal Enterprise Professional V.9 is included for production of reports. You can also analyze your data using any product that performs online analytical processing (OLAP), planning, trending, analysis, accounting, or data mining. The user interfaces are available only in English, French, German and Japanese. However reports can be translated into other languages as listed in Installing and Configuring Tivoli Data Warehouse version 1.2, GC32-0744-02. Chapter 3. IBM Tivoli products that assist in service level management 65
  • 83. Main functions There are four main functions within Tivoli Data Warehouse. Importing data from source applications: This involves running a source Extract-Transform-Load (ETL) program, commonly referred to as an ETL1, to move operational data from the source location into the central data warehouse. Data is condensed as this is done. Preparing data for use in reporting: This involves running a target ETL program, commonly known as an ETL2, to prepare data and move it into a data mart ready for use by the target reporting application. Design and production of reports: Apart from producing simple reports, this is done using the functionality of the reporting or business intelligence tools rather than the Tivoli Data Warehouse itself. Housekeeping: Various housekeeping jobs are run to maintain the database and archive old data at a predetermined point. Many IBM Tivoli products are delivered with warehouse enablement packs (WEPs), which provide the ETLs needed for the previously listed processes. The concepts of ETLs and data marts are explained further in 3.2.4, “Key concepts in IBM Tivoli Business Systems Manager” on page 59.3.3.3 Benefits of using Tivoli Data Warehouse Table 3-2 summarizes the advantages and business benefits of using the key features of Tivoli Data Warehouse.Table 3-2 Benefits and advantages of Tivoli Data Warehouse features Features Advantages Benefits Central repository for systems Can correlate and analyze data Added value through management data from various monitors in one cross-platform, business oriented place reports based on an end-to-end view of the enterprise Data consolidation Reduced data storage costs and Cost savings and data consistency easier data management; a for reporting purposes common data model Open, proven, and out-of-the No need to develop data Cost savings through reduced box interfaces for many IBM extraction programs interface development and testing Tivoli products costs Being built on a relational Data warehouse can handle The warehouse can grow with the database management system data for large enterprises organization (RDBMS) architecture provides a high degree of scalability66 Service Level Management
  • 84. Features Advantages BenefitsAbility to use many analysis Provides the ability to use the Flexibility and standardizationand reporting tools reporting tool of choice for the organizationOut-of-the-box reports for IBM Standard reports delivered with Reduced cost of designing andTivoli applications IBM Tivoli applications may be producing standard reports sufficient for many purposesIntegration with IBM Tivoli Out-of-the-box interface enables Rapid development of SLAsService Level Advisor rapid development of SLAs based on data in the warehouseBuilt-in security Ability to segregate data for Ability to use one data warehouse different customers using for multiple customers to reduce out-of-the-box functionality costs and maintenance3.3.4 Key concepts in Tivoli Data Warehouse To understand Tivoli Data Warehouse, you need to be familiar with the concepts of ETL programs and data marts. ETL programs ETL programs process data in three steps. 1. Extract: Data is extracted from the data source. 2. Transform: Data is validated, transformed, aggregated, and cleansed so that it fits the required format. 3. Load: The processed data is loaded into the target database. In Tivoli Data Warehouse, there are two types of ETLs whose operation is shown in the diagram in Figure 3-3. Central warehouse ETL: Otherwise known as a source ETL or ETL1, this ETL extracts the data from the source applications and loads it into the central data warehouse. Data mart ETL: Otherwise known as target ETL or ETL2, this ETL loads data into data marts and is discussed in the next section. Chapter 3. IBM Tivoli products that assist in service level management 67
  • 85. Service Level Advisor SLA Data Marts 2 Central Data ETL Data Source ETL1 Warehouse (schema) ETL 2 Data Marts Data Marts Reporting Data Marts Web-based Reports Figure 3-3 Tivoli Data Warehouse ETLs Data marts Although it is possible to run a query against the entire central data warehouse, this is inefficient because of the large volume and range of data that builds up over time. Instead, data is prepared in advance for use in target applications, such as Crystal Reports, and placed in a data mart. A data mart is a subset of the historical data that satisfies the needs of a specific department, team, or customer. It is optimized for interactive reporting and data analysis. The format of a data mart is specific to the reporting or analysis tool you plan to use. Each application that provides a data mart ETL creates its data marts in the appropriate format. The data mart ETL extracts a subset of historical data from the central data warehouse that contains data tailored to and optimized for a specific reporting or analysis task. The data mart ETL is also known as target ETL or ETL2.3.3.5 Tivoli Data Warehouse architecture Figure 3-4 shows the high level architecture of the Tivoli Data Warehouse in diagram form. Although Tivoli Data Warehouse can be implemented on the z/OS platform, most implementations are on distributed systems platforms. Only these are discussed in this redbook. For further information about the various possible configurations, see Implementing Tivoli Data Warehouse V 1.2, SG24-7100.68 Service Level Management
  • 86. Win NT/2000 Web-based Reports Cr TDW 1.2 ys Control Center ta le Po IE 5.5 SP2 & 6.0 rtf Netscape 6.2.3 o lio WM Agent Applications’ DB2 UDB EE & IBM HTTP Server Data Store DB2/390 IIS v4 & v5 iPlanet Lotus Domino ETL1 Central Data Data Mart Web Server Warehouse ETL2 Data Mart Data Mart Data Mart Star Schema Crystal AIX,Sun Solaris, HP-UX, Data Mart Enterprise NT/2K, OS/390, Turbo, Server RedHat and SuSE Linux AIX,Sun Solaris, NT/2K, MVS Win NT/2000/2003Figure 3-4 Reporting with Tivoli Data Warehouse Tivoli Data Warehouse is implemented on a set of Intel or UNIX servers. The exact number of physical servers required depends on the size and type of the enterprise that is being managed. Tivoli Data Warehouse Release Notes Version 1.2, SC32-1399, provides guidance about hardware and software prerequisites, as well as the physical placement of the logical servers. Figure 3-4 gives an overview of the Tivoli Data Warehouse 1.2 architecture and supported software components. The architecture can be comprised of the following elements: Tivoli Data Warehouse Control Center Server One or more central data warehouse databases One or more data mart databases IBM DB2 warehouse agents and agents sites Crystal Enterprise server The following sections explain each of these elements in detail. Chapter 3. IBM Tivoli products that assist in service level management 69
  • 87. Tivoli Data Warehouse Control Center Server The control center server is the system that contains the control database for Tivoli Data Warehouse. It is the system from which you manage your data. The control database contains metadata for both Tivoli Data Warehouse and for the warehouse management functions of IBM DB2 Universal Database Enterprise Edition. There can only be one control server in a Tivoli Data Warehouse 1.2 deployment. Source databases A source databases holds operational data to be loaded into the Tivoli Data Warehouse environment. Typically, the source databases are application specific and their number is likely to increase for a Data Warehouse installation. Most Tivoli products provide a WEP, which makes application-specific data available in a source database. This can be a dedicated warehouse source database since it is coming with IBM Tivoli Monitoring. Or it can be an interface to the application’s built in database as provided for IBM Tivoli Storage Manager or IBM Tivoli NetView. A WEP for Tivoli products also includes the means to upload data from the source database to the central data warehouse, minimizing the efforts for data collection. Central data warehouse The central data warehouse is a set of IBM DB2 databases that contains the historical data for your enterprise. You can have up to four central data warehouse databases in a Tivoli Data Warehouse 1.2 deployment. Data marts A separate set of IBM DB2 databases contains the data marts for your enterprise. Each data mart contains a subset of the historical data from the central data warehouse that satisfies the analysis and reporting needs of a specific department, team, customer, or application. You can have up to four data mart databases in a Tivoli Data Warehouse 1.2 deployment. Each data mart database can contain the data for multiple central data warehouse databases. A WEP for a Tivoli application provides all necessary means to fill data marts with their specific data.70 Service Level Management
  • 88. Warehouse agents and agent sitesThe warehouse agent is the component of IBM DB2 Warehouse Manager thatmanages the flow of data between data sources and targets that are on differentcomputers. By default, the control center server uses a local warehouse agent tomanage the data flow between operational data sources, central data warehousedatabases, and data mart databases. You can optionally install the warehouseagent component of IBM DB2 Warehouse Manager on a computer other than thecontrol center server.Typically, you place an agent on the computer that is the target of a data transfer.That computer becomes a remote agent site, which the Data Warehouse Centeruses to manage the transfer of Tivoli Data Warehouse data. This can speed upthe data transfer and reduce the workload on the control server.Crystal Enterprise ServerCrystal Enterprise Professional for Tivoli replaces completely the ReportsInterface of Tivoli Enterprise Data Warehouse (TEDW) 1.1. It gives a newmechanism for obtaining the reports provided by the WEPs. The installation andconfiguration of a Crystal Enterprise environment is mandatory before you begininstalling Tivoli Data Warehouse 1.2. Tivoli Data Warehouse 1.2 supports onlythe full stand-alone installation of Crystal Enterprise. In the full stand-aloneinstallation, Crystal Enterprise is installed on a single computer that is alreadyrunning as a Web server.Crystal Enterprise depends on a number of software components that must beup and running prior to its installation. Operating systems – Windows NT® – Windows 2000 – Windows 2003 Internet browser – Internet Explorer – Netscape Navigator Web servers – IBM HTTP Server – Microsoft IIS – iPlanet Enterprise Server – Lotus® Domino® Chapter 3. IBM Tivoli products that assist in service level management 71
  • 89. 3.4 IBM Tivoli Service Level Advisor IBM Tivoli Service Level Advisor provides SLM capabilities for enterprise organizations that need to measure, manage, and report on availability and performance aspects of their internal IT infrastructure. The SLM capabilities of IBM Tivoli Service Level Advisor complement the performance and availability measurement functions of other Tivoli products, such as IBM Tivoli Monitoring for Transaction Performance and IBM Tivoli Business Systems Manager. For more information about IBM Tivoli Service Level Advisor, refer to Introducing IBM Tivoli Service Level Advisor, SG24-6611. This section provides a basic overview of the product, its components, and functions as needed to understand and implement Business Service Management.3.4.1 Business goals Typical business goals addressed by IBM Tivoli Service Level Advisor are: Provision of SLAs that are meaningful to businesses Automation of SLA report production to reduce costs and provide timely report delivery Provision of a mechanism to resolve disagreements on SLA achievement Provision of early warning of trends toward SLAs being breached3.4.2 High level description and main functions Tivoli Enterprise Monitoring and Business System monitoring tools usually store their availability and performance data in their own databases. This data is then moved into the Tivoli Data Warehouse using ETLs as explained in 3.3.4, “Key concepts in Tivoli Data Warehouse” on page 67. After all the source ETLs have written the latest data into the central data warehouse, the IBM Tivoli Service Level Advisor ETL moves a subset of this data into the SLM measurement data mart. Here it can be processed and analyzed against defined SLOs. For example, an SLA can be based on response-time measurements against a Web application. IBM Tivoli Monitoring for Transaction Performance measures the response time of the Web site, breaking the service into associated sub-applications that complete a service transaction. Data is moved to the Tivoli Data Warehouse database, from where IBM Tivoli Service Level Advisor can extract and analyze it using its built in data-collector interface. It can then determine long-term trends. It can also generate reports showing violations, or trends toward violations, of guaranteed levels of service.72 Service Level Management
  • 90. IBM Tivoli Service Level Advisor helps IT service delivery organizations to increase the business value of their delivered service by providing the ability to understand and measure service level attainment within their organization. This service level understanding helps to: Maintain productivity and customer satisfaction Verify end user service levels Analyze historical data to predict future service levels Manage costs, and improve planning by assuring offered services Measure, manage, and report on availability and performance Automate SLM based on SLOs Evaluate service delivery based on business schedules Provide Web-based customer reports IBM Tivoli Service Level Advisor depends on the collected performance and availability data from a variety of monitoring and performance tools to deliver SLA reports and SLA trends identification. Figure 3-5 illustrates the flow of data. ITSLA Environment Source Applications Environment SLM Source So urc Server Appl 1 e ET L1 n ETL Source Sourc tratio e ETL 2 Regis Appl 2 SLM TDW ITSLA Reports Central Database Server Warehouse Pr o ces s ET L N L ET ITSLA ce SLM ur Database So Task Source ITSLA Server Appl N Measurement Drivers Data MartFigure 3-5 Data flow in the IBM Tivoli Service Level Advisor Service level management life cycle with IBM Tivoli Service Level Advisor SLM is an ongoing process. Both the service provider and customer must adjust the SLOs to achieve the best service level with reasonable costs and efforts regularly. Chapter 3. IBM Tivoli products that assist in service level management 73
  • 91. IBM Tivoli Service Level Advisor supports the full life cycle of the SLM process: 1. Creating the SLA 2. Monitoring and reporting the Service Level 3. Delivery and reviewing of SLA reports 4. Ongoing refinement of SLA agreements IBM Tivoli Service Level Advisor offers easy-to-use interfaces, quick and easy customization of features, and default values where appropriate. It is delivered with several additional IBM applications that support the functionality: IBM DB2 Universal Database (DB2 UDB) Enterprise Edition: This database is used to store measurement data. IBM Tivoli Service Level Advisor warehouse enablement packs (also known as warehouse packs): This includes ETL routines both for collecting data from the central data warehouse and writing data back into the central data warehouse for use by other applications. IBM WebSphere® Application Server: This is used by IBM Tivoli Service Level Advisor as the operating environment for the administrative user interface and the reporting interface.3.4.3 Benefits of using IBM Tivoli Service Level Advisor Table 3-3 emphasizes the features of the IBM Tivoli Service Level Advisor, while focusing on the advantages and benefits associated with them.Table 3-3 The IBM Tivoli Service Level Advisor summary Features Advantages Benefits Automated SLA Eliminates the process of manually Improves IT resource productivity, evaluation reviewing and correlating and reduces education and training component-level reports against costs required to support component customer SLAs SLM products IBM patent-pending Identifies IT service delivery problems Maintains customer productivity and trend analysis before they occur, allowing you to take satisfaction with the services they action to maintain service levels rather depend on to meet business than simply report them objectives Manage service level Leverages existing systems Provides business-level definition and business management applications, and management of IT infrastructure and schedules across associates service delivery with increases ROI of existing systems existing IT infrastructure business operations management tools Flexible, Web-based Identifies problem areas, providing Helps communicate the business reporting executive summary, and detailed value of IT resources and can justify operations status of SLAs cost expenditures74 Service Level Management
  • 92. Features Advantages BenefitsTivoli Data Warehouse Provides open, extensible aggregation Leverages business intelligence point for all systems management data tools for data mining, and provides (including non-Tivoli data), and an open interface to include cross-domain reporting additional monitoring data in SLAs3.4.4 Key concepts in IBM Tivoli Service Level Advisor To understand IBM Tivoli Service Level Advisor, you need to be familiar with the concepts of offerings, realms, and customers. For a full explanation of these concepts, see Creating SLAs with IBM Tivoli Service Level Advisor 2.1, SC32-1247. Offerings An offering is a template used to describe a service, with agreed service levels, that forms the basis for SLAs in which it is ultimately included. Offerings can be differentiated to provide service level choices to customers, such as Gold, Silver, and Bronze services, or any other naming convention that suggests a unique level of service. An offering is associated with a business schedule that is defined with one or more schedule periods. Each schedule period is associated with a unique schedule state, such as peak, prime, standard, off hours, and others. Each of these states can be configured to represent a unique level of service for that schedule period. As a result, you can offer a wide range of service levels in your offering, while also providing for scheduled outages for maintenance or other downtime activities. Realms and customers IBM Tivoli Service Level Advisor provides mechanisms called realms and customers to segregate data to ensure that reporting information is made available only to the appropriate people. Realms The highest level of segregation is called a realm. A realm contains one or more customers. For example, you may create a realm for all customers in the United States and another realm for customers in Europe. You might also create a realm for customers in a particular line of business within your organization or another grouping that makes sense for your enterprise. Customers can be associated with more than one realm. Chapter 3. IBM Tivoli products that assist in service level management 75
  • 93. Customers The second level of segregation is called a customer. A customer must be associated with at least one realm. When SLAs are defined in IBM Tivoli Service Level Advisor, they are associated with both realms and customers. When IBM Tivoli Service Level Advisor users are given access to reporting functionality, they are given permission to access specific realms and customers. They are unable to view data related to realms or customers for which they have not been granted permissions.3.4.5 IBM Tivoli Service Level Advisor architecture Figure 3-6 shows the high level architecture of the IBM Tivoli Service Level Advisor. The components are described in the following paragraphs. We recommend that you install the components of IBM Tivoli Service Level Advisor inside a firewall if possible. Figure 3-6 IBM Tivoli Service Level Advisor architecture76 Service Level Management
  • 94. The SLM serverThe SLM server performs the main functions necessary for SLM, including: Processing SLAs Scheduling and performing evaluation and trend analysis of measurement data Storing the results of the analysis Notifying of violations or trends toward violations of SLAsSLM reportsThe report servlets use the functions of the IBM WebSphere Application Serverto obtain SLA results data and generate summary reports in the form of tablesand graphs that can be displayed in a Web browser. The enterprise can usethese servlets to create customized Web pages for customers, displaying resultsof evaluation and trend analyses, such as: Actual level of service provided Number of SLA violations Trends toward future violationsSLM administration serverThe SLM administration server provides a Web-based interface in a WebSphereenvironment for: Creating offerings and SLAs Specifying schedules and defining peak times and other schedule states (such as standard, prime, off hours, and others) for varying levels of service Specifying how often evaluation and trend analysis should be performed Specifying breach values for metrics associated with offerings Managing active SLAsIBM Tivoli Service Level Advisor databasesIBM Tivoli Service Level Advisor depends on three main databases for itsoperation: The central data warehouse database from Tivoli Data Warehouse The SLM database The SLM measurement data mart Chapter 3. IBM Tivoli products that assist in service level management 77
  • 95. The central data warehouse database The central data warehouse database component of Tivoli Data Warehouse serves as the main repository for historical data that is used by applications such as IBM Tivoli Service Level Advisor. Tivoli Data Warehouse is the source for resource related data. It is also where the various Tivoli performance and availability monitoring applications send their data for long-term storage. The SLM database The SLM database serves several purposes: Stores information from Tivoli Data Warehouse that defines possible combinations of resources and metrics that are available to the customer to be used in SLAs Stores information specific to the definition and management of schedules, offerings, customers, realms, and SLAs. Stores the results of the analysis and trend evaluation processes, when SLOs are compared to expected results From this information, the customer can view summarized reports that indicate how well services are being delivered. The SLM measurement data mart The SLM measurement data mart is the database that contains a subset of the measurement data from Tivoli Data Warehouse that is of interest to IBM Tivoli Service Level Advisor in the evaluation and reporting of SLA conformance. It is updated on a regular basis with the latest metric data from Tivoli Data Warehouse.3.5 IBM Tivoli Monitoring for Transaction Performance IBM Tivoli Monitoring for Transaction Performance is a centrally managed suite of software components. These components monitor the availability and performance of Web-based services and Microsoft Windows applications. For more information of IBM Tivoli Monitoring for Transaction Performance, refer to IBM Tivoli Monitoring for Transaction Performance Administrator’s Guide Version 5.3, GC32-9189. This section provides a basic overview of the product, its components, and functions as needed to understand and implement BSM.78 Service Level Management
  • 96. 3.5.1 Business goals IBM Tivoli Monitoring for Transaction Performance typically addresses these business goals: Improving customer satisfaction by being aware of the client user experience and resolving issues quickly Improving the analysis of faults in applications to enable more rapid repairs Providing measurements based on application response times and availability to use in SLAs3.5.2 High level description and main functions IBM Tivoli Monitoring for Transaction Performance captures detailed performance data for all of your on demand business transactions. You can use this software to perform the following on demand business management tasks: Monitor transactions: You can monitor every step of an actual customer transaction as it passes through the complex array of hosts, systems, and applications: – Web and proxy servers – Web application servers – Database management systems – Legacy back-office systems and applications Simulate customer transactions: While mimicking the behavior of real users performing standard tasks, you can collect performance data that helps you assess the health of your on demand business components and configurations under different conditions and at different times. Reporting: You can produce comprehensive real-time reports that display recently collected data in a variety of formats and from a variety of perspectives. By integrating with Tivoli Data Warehouse, you can store collected data for use in historical analysis and long-term planning. Notification of performance issues: You can receive prompt automated notification of performance problems either directly through a console or by integration with IBM Tivoli Enterprise Console and IBM Tivoli Business Systems Manager. Root cause analysis: You can quickly isolate the source of performance problems as they occur, so that you can correct those problems before they produce expensive outages and lost revenue. Chapter 3. IBM Tivoli products that assist in service level management 79
  • 97. 3.5.3 Benefits of using IBM Tivoli Monitoring for TransactionPerformance Table 3-4 summarizes the main advantages and business benefits of using the key features of IBM Tivoli Monitoring for Transaction Performance.Table 3-4 Benefits of IBM Tivoli Monitoring for Transaction Performance features Features Advantages Benefits Robotic synthetic Provides a view of the experience of real Enables early identification and transactions application users resolution of service shortcomings Transaction Goes beyond the “black box” view of an Faster identification and decomposition application to understand the component resolution of problems with causing service issues; support staff needs to application availability and know less about the application architecture to performance identify root causes IBM Tivoli Enterprise Enables events to be forwarded to the IBM Console consolidation means Console integration Tivoli Enterprise Console and acted on by there is less chance of missing operators service issues IBM Tivoli Business Enables the business impact of events to be Ensures focus on the most Systems Manager assessed and to enable escalation important issues based on the integration business impact of a fault Tivoli Data Enables long-term storage of performance Reduced data storage costs and Warehouse and availability data and supports the use of the creation of meaningful SLAs integration data in SLAs created with IBM Tivoli Service Level Advisor3.5.4 Key concepts in IBM Tivoli Monitoring for TransactionPerformance To understand IBM Tivoli Monitoring for Transaction Performance, you must be familiar with the concepts of Application Response Measurement (ARM), record and playback, and Java 2 Platform, Enterprise Edition (J2EE), monitoring. For a full explanation about these concepts, see IBM Tivoli Monitoring for Transaction Performance Administrator’s Guide, GC32-9189. Application Response Measurement The ARM application programming interface (API) is the key technology used by IBM Tivoli Monitoring for Transaction Performance to capture transaction performance information. The ARM standard describes a common method for integrating enterprise applications as manageable entities. It allows users to extend their enterprise management tools directly to applications, creating a80 Service Level Management
  • 98. comprehensive end-to-end management capability that includes measuringapplication availability, application performance, application usage, andend-to-end transaction response time. The ARM API defines a small set offunctions that can be used to instrument an application to identify the start andstop of important transactions.IBM Tivoli Monitoring for Transaction Performance provides an ARM engine tocollect the data from ARM instrumented applications. This is a multithreadedapplication implemented as the tapmagent that exchanges data though an IPCchannel, using the libarm library, with ARM instrumented applications. Data iscollected and aggregated to generate useful information. It is correlated withother transactions, and then thresholds are checked against policies. Data isforwarded to the management server and placed into the database for reportingpurposes.IBM Tivoli Monitoring for Transaction Performance Version 5.3 also provides ageneric ARM component for more transaction monitoring coverage. The genericARM capability enables you to monitor custom ARM-instrumented applications. Note: ARM instrumentation does not support a 63Cbit Java Virtual Machine (JVM).The ARM engine notifies the IBM Tivoli Monitoring for Transaction PerformanceManagement Server of transaction violations, new edge transactions appearing,and edge transaction status changes.The following paragraphs provide an overview of the transaction correlationprovided by IBM Tivoli Monitoring for Transaction Performance. For additionalinformation, including instrumenting applications using ARM, see the IBM TivoliMonitoring for Transaction Performance Administrator’s Guide Version 5.3,GC32-9189.ARM correlation is the method by which parent transactions are mapped to theirrespective child transactions across multiple processes and multiple servers.Each IBM Tivoli Monitoring for Transaction Performance component isautomatically ARM-instrumented and generates a correlator. The initialroot/parent or edge transaction is the only transaction that does not have aparent correlator. From there, IBM Tivoli Monitoring for Transaction Performancecan automatically connect parent correlators with child correlators to trace thepath of a distributed transaction through the infrastructure. It provides themechanisms to easily visualize this through the topology views. Chapter 3. IBM Tivoli products that assist in service level management 81
  • 99. IBM Tivoli Monitoring for Transaction Performance implements the following ARM correlation mechanisms: Parent-based aggregation: This process collects transaction performance data on the parent of a subtransaction and displays transaction performance relative to its path. This provides the ability to monitor the connection points between transactions. It also monitors path-based transaction performance across farms of servers providing the same function. Policy-based correlators: A portion of the correlator is used to pass a unique policy identifier within the correlator. The associated policy controls the amount of data collected and the thresholds associated with that data. Instance and aggregated performance statistics: This provides both additional metrics and a complete and exact trace of the path taken by a specific transaction. Parent performance initiated trace: The trace flag within the ARM correlator is used by the agent in the trace field for transactions that are performing outside of their threshold. This provides for the dynamic collection of instance data across all systems where this transaction executes. Sibling transaction ordering: This is the ability to determine the order of execution of a set of child transactions relative to each other. Aggregated correlation: IBM Tivoli Monitoring for Transaction Performance carries out aggregated correlation. This provides a summary of a transaction over a period of time rather than a record for each and every instance of a transaction. Record and playback Record and playback records Web transactions and Microsoft Windows applications, which you can play back to assess transaction performance and availability. Performance data helps determine if a transaction is performing as expected and exposes problem areas of your Web and application environment. IBM Tivoli Monitoring for Transaction Performance provides two playback components. Each is paired with an application that records transactions. Synthetic Transaction Investigator (STI) Recorder and STI: The STI Recorder records a sequence of steps for a Web transaction, such as searching for information or purchasing an item from an online supplier. An STI playback policy instructs the STI component to play back the recorded transaction and collect performance data. Rational® Robot and Generic Windows: The Rational Robot, which is provided with IBM Tivoli Monitoring for Transaction Performance but installed as a separate application, records actions in a Microsoft Windows application.82 Service Level Management
  • 100. The Generic Windows component plays back a Rational Robot recording to provide timing measurements. J2EE instrumentation IBM Tivoli Monitoring for Transaction Performance provides enhanced J2EE instrumentation capabilities. The collection of ARM data generated by J2EE applications is invoked from the management server and is controlled by user-configured policies. The monitoring policy is then distributed to the management agent. The transactions to monitor are specified using edge definitions, for example, the first URI invoked when using the application. It is possible to define the level of monitoring for each edge. To monitor a J2EE application server, the computer must be running the IBM Tivoli Monitoring for Transaction Performance Agent. A single IBM Tivoli Monitoring for Transaction Performance agent can monitor multiple J2EE application servers on the management agent’s host. IBM Tivoli Monitoring for Transaction Performance J2EE monitoring uses Java byte-code insertion (BCI).3.5.5 IBM Tivoli Monitoring for Transaction Performance architecture The IBM Tivoli Monitoring for Transaction Performance management server is a J2EE application deployed onto the WebSphere Application Server platform. A high level view of the architecture is provided in Figure 3-7. IBM Tivoli Monitoring for Transaction Performance has the following physical components: Management server: This server provides the services and user interface needed for centralized management. Management agent: These agents are installed on computers across the environment to run discovery operations and collect performance data for monitored transactions. Store and forward management agent: This component enables transfer of data across firewalls. ARM engine: This component handles internal systems management data passed from business applications that have been ARM instrumented. The following sections explain each of these components further. Chapter 3. IBM Tivoli products that assist in service level management 83
  • 101. Figure 3-7 IBM Tivoli Monitoring for Transaction Performance architecture The management server The management server is the control center for the IBM Tivoli Monitoring for Transaction Performance installation. It is shared by all IBM Tivoli Monitoring for Transaction Performance components. The management server collects information from and provides services to deployed management agents. Deployed as a standard IBM WebSphere Application Server Enterprise Archive (EAR) file, the management server provides the following functions: User interface: This interface is accessed via a browser and has many uses including: – Creating and scheduling policies to instruct monitoring components to collect performance data – Establishing acceptable performance metrics or thresholds, defining notifications for threshold violations and recoveries – Viewing reports and system events – Managing schedules84 Service Level Management
  • 102. Real-time reports: This interface is also accessed by a browser and provides graphical displays of performance data collected by the monitoring and playback components. There are reports to help you assess the performance and availability of your Web sites and Microsoft Windows applications. Event generation: Application events are generated when performance thresholds are exceeded; system events are generated for system errors and notifications. Events can be viewed and event severities configured to decide what action will to be taken when they are generated. The management server can send e-mail notification to specified recipients, run a specified script, or forward selected event types to the IBM Tivoli Enterprise Console or as Simple Network Management Protocol (SNMP) traps. Storage of policies and data: The management server controls a set of databases that store policy information, events, and performance data collected by management agents. Communication with management agents: The management server uses Web services and the Secure Sockets Layer (SSL) to communicate with the management agents. ARM data is uploaded to the management server from management agents at regularly scheduled intervals (the upload interval). By default, the upload interval is once per hour.The management agentManagement agents, based on Java Management Extensions (JMX), areinstalled on computers across your environment. Management agents providethe following functions: Discovery: This enables automatic identification of incoming Web transactions that may need to be monitored. Listening and playback monitoring: A management agent can have listening and playback components installed that run policies at scheduled times. The management agent sends any events generated during a listening or playback operation to the management server, where event information is made available in event views and reports. ARM engine for data collection: A management agent uses the ARM API to collect performance data. Each of the listening and playback components is instrumented to retrieve the data using ARM standards. Policy implementation: When a discovery, listening, or playback policy is created, an agent group is assigned to run the policy. You define agent groups to include one or more management agents that are equipped to run the same policy. For example, if you want to monitor the performance of a consumer banking application that runs on several WebSphere application servers, each of which is associated with a management agent and a J2EE monitoring component, you can create an agent group named All J2EE Chapter 3. IBM Tivoli products that assist in service level management 85
  • 103. Servers. All of the management agents in the group can run a J2EE listening policy that you create to monitor the banking application. Threshold checking: When performance thresholds in listening or playback policies are exceeded, the management agent sends events to the management server. Events can be set for transactions, and in many cases, for the subtransactions within a transaction. This is one step in an overall transaction. Store and forward management agent Store and forward can be implemented on one or more management agents (typically only one) to handle firewall situations. Important: Store and forward cannot work with proxies. In general, you need one store and forward management agent for each firewall that has to be traversed. Store and forward management agents perform these firewall-related tasks: Enabling point-to-point connections between management agents and the management server Enabling management agents to interact with store and forward as though store and forward were a management server Routing requests and responses to the correct target Supporting SSL communications Supporting one-way communications through firewall The ARM engine When you install and configure a management agent, the ARM engine is automatically installed as part of the management agent. The ARM engine and ARM API comply with the ARM 2.0 and 4.0 specifications. The ARM specification was developed to meet the challenge of tracking performance through complex, distributed computing networks. ARM provides a way for business applications to pass information about the subtransactions they initiate in response to service requests that flow across a network. This information can be used to calculate response times, identify subtransactions, and provide additional data to help you determine the cause of performance problems. The Generic ARM component (new in Version 5.3 of IBM Tivoli Monitoring for Transaction Performance) enables you to monitor the performance of any ARM 2.0- or 4.0-instrumented application. You can monitor both ARM-instrumented86 Service Level Management
  • 104. products from independent software vendors (ISV) or custom in-house applications. The Generic ARM component can also detect and monitor custom metrics that are recorded from these ARM instrumented applications. All transaction data collected by the Quality of Service, J2EE, STI, and Generic Windows monitoring components of IBM Tivoli Monitoring for Transaction Performance is collected by ARM.3.6 IBM Tivoli Enterprise Console IBM Tivoli Enterprise Console provides a focal point for events coming from monitoring products installed in a distributed systems environment. It is usually associated with implementation of Tivoli Framework products but can also handle event information sent using the SNMP. For more information about IBM Tivoli Enterprise Console, refer to IBM Tivoli Enterprise Console User’s Guide 3.9, SC32-1235.3.6.1 Business goals IBM Tivoli Enterprise Console typically addresses these business goals: Increasing efficiency of operations staff by providing a single event console Reducing operational costs by automating fixes to common problems Providing an effective and automated incident escalation solution3.6.2 High level description and main functions The IBM Tivoli Enterprise Console product is a rule-based event management application. It integrates system, network, database, and application management to help ensure the optimal availability of the IT resources in an enterprise. The main functions of the IBM Tivoli Enterprise Console are: To provide a centralized, global view of your computing enterprise To collect, process, and automatically respond to common management events, such as a database server that is not responding, a lost network connection, or a successfully completed batch processing job To act as a central collection point for alarms and events from a variety of sources, including those from other Tivoli software applications, Tivoli partner applications, custom applications, network management platforms, and relational database systems Chapter 3. IBM Tivoli products that assist in service level management 87
  • 105. To forward appropriate events to the IBM Tivoli Business Systems Manager to enable it to determine the business impact of faults The Tivoli Enterprise Console product helps you effectively process the high volume of events in an IT environment by: Prioritizing events by their level of importance Filtering redundant or low-priority events Correlating events with other events from different sources Determining who should view and process specific events Initiating automatic corrective actions, when appropriate, such as escalation notification, and opening trouble tickets Identifying hosts and automatically grouping events from the hosts that are in maintenance mode in a predefined event group3.6.3 Benefits of using IBM Tivoli Enterprise Console Table 3-5 summarizes the main advantages and business benefits of using the key features of IBM Tivoli Enterprise Console.Table 3-5 Benefits of IBM Tivoli Enterprise Console features Features Advantages Benefits Event filtering Events requiring no further action are not Operators can focus on the displayed on the console significant events Event correlation Operators focus on the cause of faults rather More rapid fault resolution than the symptoms Automatic Significant faults that are not noticed or not yet Improvement in service availability escalation worked on are escalated automatically IBM Tivoli Business Enables the business impact of events to be Ensures focus on the most Systems Manager assessed and escalated important issues based on the Integration business impact of a fault Tivoli Data Enables long-term storage of performance Reduced data storage costs and Warehouse and availability data and supports the use of the creation of meaningful SLAs integration data in SLAs created with IBM Tivoli Service Level Advisor88 Service Level Management
  • 106. 3.6.4 Key concepts of event groups in IBM Tivoli Enterprise Console To understand IBM Tivoli Enterprise Console, you need to be familiar with the concepts of event groups. This section introduces you to event groups. However, you can find a detailed explanation in IBM Tivoli Enterprise Console Installation Guide Version 3.9, SC32-1233. An event group is a configured logical area of responsibility that is used to notify users that an event matching a specified set of criteria has occurred. An administrator configures event groups using the Java version of the event console. For example, if your network contains a group of computers that are used for critical work, you may want to create an event group that receives events for these critical computers. This logical grouping of events is an event group. To define an event group, you must specify the selection criteria for the events in the group. This data constitutes an event group filter. An event group filter can include any event attribute except for extended or customer-defined attributes. Table 3-6 lists some of the more commonly used attributes for event group filtering. Table 3-6 Attributes for event group filtering Attribute name Description Event class Specifies the class of the event, as assigned by the event source that forwards the event Origin Identifies the protocol address or host name of a host from which you want to receive events Severity Specifies the severity of the event from Unknown, through Harmless to Fatal Source Specifies the type of application that created the event Status The status of the event, which could have various states including Open, Closed, and Acknowledged Chapter 3. IBM Tivoli products that assist in service level management 89
  • 107. 3.6.5 IBM Tivoli Enterprise Console architecture A high level view of the architecture of IBM Tivoli Enterprise Console is provided in Figure 3-8. The key components are described in the sections that follow. Figure 3-8 IBM Tivoli Enterprise Console architecture The IBM Tivoli Enterprise Console event server The event server is at the heart of the IBM Tivoli Enterprise Console. It provides a centralized location for the management of events in a distributed environment. The event server processes input from event consoles and updates the event database. Event consoles read data from the event database and see the latest status of events as they are updated. The event server evaluates events against a set of rules to determine if it should automatically perform any predefined tasks90 Service Level Management
  • 108. or modify the event. If human intervention is required, the event server notifiesthe appropriate operator. The operator performs the required tasks and thennotifies the event server when the condition that caused the event is resolved.Incoming are events given a unique number and time stamped as they areentered into the event database. They are then evaluated by the rule engine. Ifthe rule engine is busy, events are buffered and evaluated later. Rules includeaction to be taken when an event meets the specified rule conditions. This helpsto reduce the amount of interpretation and responses required by operators. Forexample, a particular event may be known to trigger one or more instances ofanother event. In such a case, a rule can be used to automatically downgrade theseverity of the event or close events that are known to be caused by thetriggering event.The event server can use rules to delay responses to an event. This may be useto deal with self-correcting faults to prevent an operator from needlesslyresponding to a problem that will shortly go away. Rules can be used, forexample, to attempt to restart a router and give an operator a low-severity notice.If the attempts to restart the router within a designated time period fail, a rule canspecify that attempts to retry be cancelled and that a higher-severity notice besent to an operator. If an operator does not respond to an event after a specifiedperiod of time, the event server can take additional actions including sending ane-mail, paging the operator, or sending an e-mail notice to an alternate contact.You can use the predefined rules that the Tivoli Enterprise Console productprovides, or you can create your own. For full information about the predefinedrules, see IBM Tivoli Enterprise Console Rule Set Reference Version 3.9,SC32-1282. You can find information about creating your own rules in IBM TivoliEnterprise Console Rule Developer’s Guide Version 3.9, SC32-1234.A rule can specify the following actions among others: Correlating events Responding automatically to events, such as running an application or script Delaying responses to events Escalating events Modifying event attributes Modifying attributes of other events Preventing duplicate events from being displayed Dispatching Tivoli or other administrative actions on resources Reevaluating a set of events Discarding an event Generating a new event Forwarding an event to another event server Chapter 3. IBM Tivoli products that assist in service level management 91
  • 109. IBM Tivoli Enterprise Console Event database The Tivoli Enterprise Console product uses an external RDBMS to store the large amount of event data that is received. The RDBMS Interface Module (RIM) component of the Tivoli Management Framework is used to access the event database. IBM Tivoli Enterprise Console user interface server The user interface (UI) server provides communication services between the event consoles and the event server. It automatically updates the event database when, for example, an operator acknowledges an event. The UI server also provides a set of commands that enable an operator to change any event attribute, list the events in a specific event group, and display a message on the operator’s desktop. IBM Tivoli Enterprise Console Event console An event console provides the graphical user interface (GUI) used by operators to view and respond to events. IBM Tivoli Enterprise Console product provides two versions of the event console, a Java version and a Web version. Certain tasks require the Java console, but either version can be used to manage events. The event console provides a window for monitoring events based on event groups. An event group is a set of events that meets certain filter criteria. The Java event console Key features of the Java event console include: Tivoli secure logon for added security Event information directly retrieved by each event console from the database for high performance and scalability Configurable refresh rate Ability to run third-party or custom scripts and applications from the event console Ability to run predefined tasks Ability to configure automated tasks to run when a particular event is received by the event console Ability to view more help information about an event in a Web page Automatic resolution of conflicts, for example, should two operators simultaneously attempt to change the status of an event92 Service Level Management
  • 110. Support of multiple views: – Configuration view to configure the event consoles – Summary chart view to show a high-level overview of the health of resources represented by an event group – Priority view showing event groups are represented by buttons with the status indicated by colorThe Web event consoleThis is used to manage events from your Web browser and provides many of thefunctions available in the Java console. The Web version of the event consoleorganizes the tasks that you can perform in a portfolio, which is titled My Work.IBM Tivoli Enterprise Console event adapterAn event adapter is a process that typically resides on the same host as amanaged source and monitors the source for events.For example, if you want to monitor the Windows event log, you would install theWindows event log adapter on the host. When an event adapter receivesinformation from its source, the adapter formats the information and forwards it tothe event server for interpretation and response.You can configure an event adapter to discard selected events instead offorwarding them all to the event server to reduce network traffic and event serverworkload.Tivoli Event Integration FacilityThe Tivoli Event Integration Facility is a toolkit that expands the types of eventsand system information that you can monitor. You can use it to develop your ownadapters that are tailored to your network environment and your specific needs.Tivoli Enterprise Console gatewayThe Tivoli Enterprise Console gateway receives events from TME® andnon-TME adapters and forwards them to an event server. The Tivoli EnterpriseConsole gateway provides the following benefits: Greater scalability, which allows you to manage sources with less software running on the endpoints Improved performance of the event server Simple deployment of adapters and updates Event correlation and filtering closer to the sources decreasing the amount of network traffic Chapter 3. IBM Tivoli products that assist in service level management 93
  • 111. Adapter Configuration Facility The Adapter Configuration Facility provides a GUI to configure and distribute TME adapters. You can use the Adapter Configuration Facility to create profiles for adapters and set adapter configuration and distribution options. Tivoli NetView IBM Tivoli NetView provides the network management function for the IBM Tivoli Enterprise Console product. It monitors the status of network devices and automatically filters and forwards network-related events to IBM Tivoli Enterprise Console.3.7 IBM Tivoli Monitoring IBM Tivoli Monitoring provides automated monitoring of essential IT system resources. For more information about IBM Tivoli Monitoring, refer to IBM Tivoli Monitoring User’s Guide version 5.1.2, SH19-4569-03.3.7.1 Business goals Typical business goals addressed by IBM Tivoli Monitoring are: Provision of high quality services Proactive monitoring of services Making the best value of the IT infrastructure3.7.2 High level description and main functions IBM Tivoli Monitoring applies pre-configured best practices to the automated monitoring of essential IT system resources. The application detects bottlenecks and other potential problems, provides for the automatic recovery from critical situations, and eliminates the need for system administrators to scan manually through extensive performance data. IBM Tivoli Monitoring integrates seamlessly with other Tivoli availability solutions, including IBM Tivoli Business Systems Manager and IBM Tivoli Enterprise Console. It was previously called Tivoli Distributed Monitoring (Advanced Edition). Most features of IBM Tivoli Monitoring can be used as supplied, or modified using the GUI or command line interface (CLI) provided. The main features of Tivoli Monitoring are: An off-the-shelf solution for monitoring Windows, UNIX, Linux®, and OS/400® systems, with data collection and problem analysis performed locally94 Service Level Management
  • 112. Ready-to-use resource models that report on specific aspects of a system’s status For example, the Process resource model provides information about the status of processes, CPU usage, and so forth. The ability to add resource models to a Tivoli profile, which can be distributed to multiple systems simultaneously The ability to modify resource models by changing, for example, threshold levels to match specific requirements The ability to view both real-time and historical data for any system from a centralized monitoring application, called the Web Health Console, which is supplied with the product The ability to send the results of data collection and analysis to the IBM Tivoli Enterprise Console or to the IBM Tivoli Business Systems Manager The ability to specify automatic corrective or preventive actions to resolve situations that could develop into real problems The ability to schedule monitoring to take place at user-specified times A heartbeat function that regularly checks the availability and status of attached endpoints and makes the information available to the IBM Tivoli Enterprise Consoleserver, IBM Tivoli Business Systems Manager, or Tivoli Monitoring Notice Group3.7.3 Benefits of using IBM Tivoli Monitoring Table 3-7 summarizes the main advantages and business benefits of using the key features of IBM Tivoli Monitoring.Table 3-7 Benefits and advantages of IBM Tivoli Monitoring features Features Advantages Benefits Out-of-the-box Little or no configuration required to start Rapid ROI resource models monitoring on implementation Heartbeat function Rapid and automatic notification of More responsive fault resolution resources that cannot be contacted leading to increased customer satisfaction Web Health Console Ability to view real-time and historical data Better informed problem analysis for a resource IBM Tivoli Enterprise Enables events to be forwarded to IBM Console consolidation means less Console Integration Tivoli Enterprise Console chance of missing service issues Chapter 3. IBM Tivoli products that assist in service level management 95
  • 113. Features Advantages Benefits IBM Tivoli Business Enables the business impact of events to be Ensures focus on the most Systems Manager assessed and to enable escalation important issues based on the Integration business impact of a fault Tivoli Data Enables long-term storage of performance Reduced data storage costs and Warehouse and availability data and supports the use the creation of meaningful SLAs Integration of data in SLAs created with IBM Tivoli Service Level Advisor3.7.4 Key concepts in IBM Tivoli Monitoring To understand IBM Tivoli Monitoring, you need to be familiar with the concepts presented in the following sections. Resource models In IBM Tivoli Monitoring terminology, a resource model is defined as “the logical modeling of one or more resources, along with the logic on which cyclical data collection, data analysis, and monitoring are based.” In practical terms, a resource model is a pre-built set of rules for monitoring a resource using IBM Tivoli Monitoring that is installed, for example on a server that may take corrective action or send an event if an exception condition is detected. IBM Tivoli Monitoring provides a range of out-of-the box, predefined resource models to specify which resource data is accessed from the system at runtime and how this data is processed. For example, the Process resource model obtains data related to processes running on the system. Performance data is automatically collected by the resource model and processed by an appropriate algorithm to determine whether the system is performing to your expectations. Generally, you can use the resource model default values and still obtain useful data. However, if necessary, you can customize the resource models to suit your requirements or even build your own resource models using the IBM Tivoli Resource Model Builder. For details about the resource models supplied with the product, see IBM Tivoli Monitoring Version 5.1.2 Resource Model Reference Guide, SH19-4570-03. For guidance about creating resource models, see IBM Tivoli Resource Model Builder Version 1.1.3 User’s Guide, SC32-1391-02. Cycles and thresholds Resource models run on a cyclical basis. A resource model installed at an endpoint gathers data at regular intervals, known as cycles. The duration of a cycle is the cycle time. A resource model with a cycle time of 60 seconds gathers96 Service Level Management
  • 114. information every 60 seconds. The data collected is a snapshot of the status ofthe resources specified in the resource model. Each of the supplied resourcemodels has a default cycle time, which you can modify.Each resource model defines one or more thresholds. A threshold is a namedproperty of the resource with a default value that you can modify in thecustomization phase. Typically, the value specified for a threshold represents asignificant reference level of a performance-related entity. If the level is exceededor not reached, the operator or system administrator should be notified.IndicationsEach resource model generates an indication if certain conditions implied by theresource model’s thresholds are not satisfied in a given cycle. Each resourcemodel has its own algorithm to determine which combinations of thresholdsshould generate an indication.Indications may be generated in any one of the following circumstances: A single threshold is exceeded: For example, in the Windows Process resource model, the Process High CPU indication is generated when the High CPU Usage threshold is exceeded. A combination of two or more thresholds are exceeded: For example, in the Windows Logical Disk resource model, a High Read Bytes per Second indication is generated when both the following thresholds are exceeded: – The amount of bytes transferred per second (being written or read) exceeds the High Bytes per Second threshold. – The percentage of time that the selected disk drive spends for read or write requests exceeds the High Percent Usage threshold.Occurrences and holesIBM Tivoli Monitoring resource models do not look only for conditions that exceedthresholds once. They can also look for a pattern of repeats over time. Anoccurrence is the term used to refer to a cycle during which an indication occursfor a given resource model. A hole is the term used to refer to a cycle duringwhich an indication does not occur for a given resource model.Resource models can compare a series of measurements with a given pattern ofoccurrences and holes to determine whether further action is needed. Thisapproach provides much greater flexibility and avoids precipitate raising ofevents. This is explained in great detail with examples in IBM Tivoli MonitoringVersion 5.1.2 Resource Model Reference Guide, SH19-4570-03. Chapter 3. IBM Tivoli products that assist in service level management 97
  • 115. The heartbeat function In addition to the monitoring processes described earlier, IBM Tivoli Monitoring operates a heartbeat function. This function monitors the basic system status at endpoints attached to the gateway at which it is enabled. In essence, this function checks regularly to determine whether resources can be reached in the network. If not, events may be sent to IBM Tivoli Enterprise Console, IBM Tivoli Business Systems Manager, and the IBM Tivoli Monitoring Notice Group.3.7.5 IBM Tivoli Monitoring architecture Figure 3-9 shows a high level view of the architecture of IBM Tivoli Monitoring. The key components are described in the sections that follow. Figure 3-9 IBM Tivoli Monitoring components98 Service Level Management
  • 116. The IBM Tivoli Monitoring Base componentInstall this component on the Tivoli management region server and on allgateways with endpoints that you want to monitor. It provides a GUI and a CLIthat are available at both the server and gateway. You can control all functions ofthe product from either node. And you can configure the component to operatethe heartbeat function for all endpoints directly attached to the system on which itis installed.IBM Tivoli Monitoring Web Health ConsoleThe Web Health Console is the Web-based graphical interface for TivoliMonitoring. It allows you to view real-time information about a specific problemand check the status (or health) of a set of endpoints. You can use the WebHealth Console to work with real-time data or with historical data that waspreviously logged to a local database.IBM Tivoli Monitoring Endpoint componentThe endpoint component, which requires a Tivoli management agent, performsthe resource management through one or more resource models that aredistributed to the endpoint with a Tivoli Monitoring profile. The endpointcomponent is installed automatically when a Tivoli Monitoring profile isdistributed to the endpoint for the first time.The IBM Tivoli Monitoring TBSM AdapterThis component feeds discovery information and IBM Tivoli Monitoring events tothe IBM Tivoli Business Systems Manager.The Gathering Historical Data componentThis component enables IBM Tivoli Monitoring to use Tivoli Decision Support forServer Performance Prediction (Advanced Edition). It uses data collected byspecific IBM Tivoli Monitoring resource models to populate a database on theTivoli server where it is installed. The collected data is aggregated every 24hours and added to the IBM Tivoli Monitoring database.The Tivoli Data Warehouse Support componentThis component enables the integration of IBM Tivoli Monitoring with Tivoli DataWarehouse. Getting data into the Tivoli Data Warehouse enables production ofmore sophisticated data analysis and the potential of using IBM Tivoli Monitoringdata in SLAs with the use of IBM Tivoli Service Level Advisor. Chapter 3. IBM Tivoli products that assist in service level management 99
  • 117. 3.8 Bringing it all together in support of SLM processes So far this chapter has provided an overview of the IBM Tivoli products involved in supporting the implementation of SLM processes. This section provides a technical description of how you can use these products to support SLM processes implementation. IBM Tivoli products focus on specific areas of expertise and provide a wide range of features unmatched by any other vendor. Together they are well suited to address every stage of the SLM process that is illustrated by Figure 3-10. SERVICE LEVEL BUSINESS IMPACT SLM BSM Analytics Analytics MANAGEMENT Availability Real-Time SLA/OLA/UC Historical Performance Event Management VISUALIZATION Management Automation Reporting Reporting METRICS EVENTS Monitoring Monitoring Monitoring Monitoring MONITORING User Experiences Resources User Experiences Resources Monitoring Monitoring Monitoring Transactions EVENTS Transactions VISUALIZATION IT Services NEGOTIATE AGREEMENTS Relationships User Expectations Business Activity Application Infrastructure IT NO IT IDENTIFY Business Units IT Development IT OperationsFigure 3-10 An integrated view of SLM, BSM, and monitoring in process context How can you integrate the existing Tivoli products to maximize their value in support of the process illustrated by Figure 3-10? Since software products are simply tools in support of processes deployed by an IT organization, and their solutions vary with each IT organization, the following sections outline a generic integration approach that is represented by Figure 3-10.100 Service Level Management
  • 118. The integration approach addresses the following elements: Service definitions Real-time monitoring Historical monitoring Fault management SLA reporting and alerting Problem and change management3.8.1 Service definitions SLM requires an IT organization to establish service definitions by cataloging IT services and identifying resources used by each IT service. Service definitions must reflect the actual relationships between IT services and resources. The real benefit of IBM Tivoli Business Systems Manager comes from the ability to create collections of resources that represent business systems, such as key business processes and applications. Tivoli Business Systems Manager discovers IT resources and relationships and allows an IT organization to construct business systems and map resources and associated events to business systems. Tivoli Business Systems Manager uses two different methods to discover resources and their relationships as they exist in the real world. The first method is a set of explicit discovery routines that periodically scan a particular environment and return the components within that environment. The second method listens for and processes incoming events that signal new resources within the environment and then performs resource creation. Tivoli Business Systems Manager object model maps discover resources and their relationships hierarchically as they exist in an IT infrastructure. This physical resource pool becomes the source for business system construction that enables management by business services. The Tivoli Business Systems Manager object model includes definitions for many of the thousands of different resource types that can be found within an IT infrastructure. Tivoli Business Systems Manager model can be extended to include additional resource types. Business systems can contain any type of resources and be organized in any manner that suits user needs. For example, business systems can model resources within a service, application, geography, area of responsibility, etc. They can be converted into services as required and made available for executive dashboard views and SLA alerting. For information about business systems construction, see 4.2.2, “Basic business system building” on page 119. Tivoli Business Systems Manager provides facilities for off-loading business system information to Tivoli Data Warehouse and later to IBM Tivoli Service Chapter 3. IBM Tivoli products that assist in service level management 101
  • 119. Level Advisor. This information includes business system hierarchical structures and the actual time for each of six states for every business system. IBM Tivoli Service Level Advisor operates based on service offerings that are defined manually and have a set of metrics that is linked to the service while it is created. Important: The practical approach to Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor integration involves the IBM Tivoli Service Level Advisor service offering structures modeled on Tivoli Business Systems Manager services. Therefore, Tivoli Business Systems Manager business system data can be used for more accurate measurement of availability for each defined service offering while IBM Tivoli Service Level Advisor can notify the corresponding Tivoli Business Systems Manager service of the pending SLA violation and trending alerts.3.8.2 Real-time monitoring Tivoli Business Systems Manager accepts data from a a variety of sources including most industry monitoring products. In addition, it accepts data from major scheduling packages, including Tivoli Workload Scheduler. Tivoli Business Systems Manager supports both distributed and mainframe data sources. Tivoli distributed monitors communicate with Tivoli Business Systems Manager either through IBM Tivoli Enterprise Console or directly. Tivoli distributed products monitor resource changes and respond by sending predefined events to IBM Tivoli Enterprise Console. Through IBM Tivoli Enterprise Console rules, these events are then forwarded to Tivoli Business Systems Manager via an agent listener. Tivoli Business Systems Manager also instrumented many adapters for monitoring products that monitor instrumented environments and send resource changes directly to Tivoli Business Systems Manager via a common listener. Monitoring products for distributed platforms deploys several techniques to capture resource changes and generate real-time events, such as log scanning adapters, SNMP managers, and IBM Tivoli Monitoring resource models. Each event is preclassified and assigned the alert state and priority. Tivoli Business Systems Manager also provides an OS/390® adapter for monitoring mainframe environments. It can communicate to Tivoli Business Systems Manager either via IP or SNA protocols. It supports several data feeds such as z/OS, IMS, CICS, DB2, SA/390 automation, storage, WebSphere, network, and batch. The OS/390 adapter can capture console messages and timer based polling events and generate predefined Tivoli Business Systems Manager events.102 Service Level Management
  • 120. Important: Tivoli Business Systems Manager expands real-time event monitoring into real-time monitoring of resource states. It adds value by processing incoming events and recognizing their impact on the state of the corresponding resources. Using the business systems constructs and propagation rules, Tivoli Business Systems Manager combines the states of related resources and allows real-time monitoring of services.3.8.3 Historical monitoring In addition to sending real-time events to Tivoli Business Systems Manager, IBM Tivoli monitoring products collect measurement data. Each monitoring product stores its data in the product database and periodically transfers this historical data into Tivoli Data Warehouse using their WEPs. Tivoli Data Warehouse is a Tivoli product that offers a centralized database for all Tivoli product data. The schemes of this database are open and published. Systems management data from non-Tivoli products can also be integrated. As described in 3.3, “IBM Tivoli Data Warehouse” on page 64, the central data warehouse database uses a generic schema that is the same for all applications. As new components or new applications are added, more data is added to the database. However, no new tables are added in the schema. Historical data, stored in Tivoli Data Warehouse, is aggregated as well as correlated and can be used for reporting by many third-party tools. The latest Tivoli Business Systems Manager WEP provides three enablement options: IBM Tivoli Service Level Advisor integration Tivoli Data Warehouse reporting IBM Tivoli Service Level Advisor integration and Tivoli Data Warehouse reporting Although the Tivoli Business Systems Manager WEP includes programs in support of all three options, the sequence in which the program runs depends on which option is selected. Tivoli Business Systems Manager WEP includes both source and target ETLs. The source ETL loads Tivoli Business Systems Manager data, such as managed resource, events, alert state changes, notes and state transition measurements of business systems, into the central data warehouse database. The target ETL retrieves this data and loads it into the GTM schema in the datamart database. Tivoli Business Systems Manager provides two options for reporting historical data via the same set of reports: Chapter 3. IBM Tivoli products that assist in service level management 103
  • 121. Tivoli Business Systems Manager history server and reporting system that provide Tivoli Business Systems Manager ASP reports Reports available using the Tivoli Data Warehouse reporting interface: Crystal Enterprise Professional for Tivoli Tivoli Business Systems Manager information in the central data warehouse database is also used by IBM Tivoli Service Level Advisor to generate SLA reports. IBM Tivoli Service Level Advisor uses a set of ETLs to extract data from the central data warehouse database to the SLM measurement data mart database for further analysis and reporting. For details about Tivoli Data Warehouse and IBM Tivoli Service Level Advisor data sources, see Chapter 4, “Planning to implement service level management using Tivoli products” on page 109. Each data source has a unique code that identifies the product with which it is associated. Important: Tivoli Data Warehouse facilitates an integration of historical data from Tivoli and third-party products through a centralized database and a set of supported WEP. The main task is to install and schedule these WEPs. Since the size of a database depends on the size of the IT enterprise, it is critical to plan runs and estimate timings for each WEP.3.8.4 Fault management Tivoli Business Systems Manager processes real-time events that are captured from a variety of data sources, stores them in the Tivoli Business Systems Manager database, and posts the appropriate alerts to the corresponding physical resources. Each incoming event has a predefined alert state and priority and is identified with the specific resource instance. Events affect the state of a resource. Tivoli Business Systems Manager propagates state changes upward to affect the resource’s parents and to facilitate the determination of the status of Business views. Propagation is implemented by generating a child event to parent resources. Tivoli Business Systems Manager can regulate propagation through a number of propagation rules. For details about propagation scenarios, see Chapter 4, “Planning to implement service level management using Tivoli products” on page 109. Tivoli Business Systems Manager provides several technologies to visualize resources, business systems, events, relationships, and impact. Tivoli Business Systems Manager supports three types of consoles: Java Console, Web Console, and Executive Console. Each view and console is designed to add value in a particular way. When combined together, they deliver a powerful mechanism for real-time fault management.104 Service Level Management
  • 122. Tivoli Business Systems Manager is designed to manage events in the SLM context through automatic alert propagations to prebuilt and dynamically constructed business systems and services. Tivoli Business Systems Manager events are preclassified by the resource class, alert state, priority, and event type. Most of the defaults can be customized via a GUI, and new resource classes and events can be added. For details about Tivoli Business Systems Manager events and their classification, refer to IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. Tivoli Business Systems Manager provides management facilities, but a customer’s preparedness plays a significant role in achieving effective fault management. Some of the preparation activities are: Identify which events can cause outages; tune Tivoli Business Systems Manager red defaults Identify which events can cause degradation; tune Tivoli Business Systems Manager yellow defaults Consider business impact when constructing business systems Customize alert propagation rules to maximize alert management Find the best use of available views to match operational processes Customers need to classify faults. Tivoli Business Systems Manager red alerts, particularly of critical or high priority, can be classified as faults. Tivoli Business Systems Manager yellow alerts, and perhaps some red alerts of medium and low priorities, can be classified as warnings. Before rolling out Tivoli Business Systems Manager for production, do some preparation. Continuous adjustments and operational training help to improve the effectiveness of fault management and reduce the impact on service levels. Important: A potential outage needs to be fixed as soon as possible to keep SLA attainment. Faults may arrive at a rapid rate and operators must respond to problems based on business impact. Prioritizing faults can greatly improve operators productivity and reduce problem investigation time. Effective use of event, impact, and topology views to evaluate events and their impact are essential to efficient fault management.3.8.5 SLA reporting and alerting Evaluation of SLAs is one of the main functions of the IBM Tivoli Service Level Advisor product. IBM Tivoli Service Level Advisor automates service level assessment against the predefined thresholds and recognizes when SLAs are breached or about to be breached. In addition, IBM Tivoli Service Level Advisor Chapter 3. IBM Tivoli products that assist in service level management 105
  • 123. provides management reports about the actual service levels, SLA violation statistics, and trends toward SLA violations. IBM Tivoli Service Level Advisor depends on the collected performance and availability data from a variety of monitoring and performance tools. This data is stored in the SLM measurement data mart, but all analysis and evaluation results are stored in the SLM database. You can retrieve the analysis data and summarize it into reports that you can view using a Web browser. The SLM report console provides a colorful high level summary report that is displayed in table form, showing totals of trends and violations across the reporting period, grouped by realms and customers. Clicking the table cells invokes accompanying color charts and additional tables of summary information about trends and violations, key operations information, and specific details about particular customers and SLAs. For more details, refer to IBM Tivoli Service Level Advisor SLM Reports, SC32-1248. IBM Tivoli Service Level Advisor analyzes data that is obtained from Tivoli Data Warehouse according to a predefined schedule. This data is evaluated for violations and trends toward future violations of the agreed upon levels of service. Notifications of violations and trends are sent automatically by a way of e-mail, SNMP traps, or IBM Tivoli Enterprise Console events. IBM Tivoli Service Level Advisor performs evaluation of the aggregate data collected from Tivoli Data Warehouse against predefined breach values (for each metric and schedule state periods) to determine if service levels are being maintained. (If the breach value is violated, IBM Tivoli Service Level Advisor generates the violation event.) For example, the breach value defined for total is compared to the sum of all hourly values reported over the entire evaluation period. Accordingly, the breach value for maximum or minimum is compared to the lowest or highest single hourly value. IBM Tivoli Service Level Advisor uses a linear algorithm or exponential stress detection algorithm to analyze existing measurement data and to predict trends toward violations. Both algorithms are active and evaluate the same data for trends according to their methods of evaluation. Due to the iterative estimations and calculations used by the exponential stress detection algorithm, no graphical trend line associated with this algorithm is displayed with graph data. Trend lines that are displayed with graphs are associated with the linear algorithm only. If the predicted value approaches the breach value and if the value is predicted to exceed the breach value by either the linear or the exponential stress detection algorithm, then a trend detection event is reported. If there is an outstanding trend detection event, and the current evaluation value is significantly away from the breach value, a trend cancel event is reported. However, if a violation occurs after the trend detection event, a trend cancel event is never reported.106 Service Level Management
  • 124. IBM Tivoli Business Systems Manager V3.1 introduced the Executive View console, which provides a dashboard approach to presenting a service status to executives. Optionally, a service can show status information for IBM Tivoli Service Level Advisor as the Secondary Impact Information (SII) indicator. SII indicators do not follow the “normal” Tivoli Business Systems Manager status propagation rules. The status of an SLA SII alert is shown by a symbol rather than by a color. IBM Tivoli Service Level Advisor can send SLA trend and violation events to IBM Tivoli Enterprise Console where they are trapped by a IBM Tivoli Enterprise Console rule and forwarded to Tivoli Business Systems Manager via the event enablement and the agent listener. SLA alerts are posted to the corresponding service object and can be viewed in executive console as secondary impact indicators. In addition, SLA alerts can be forwarded automatically to people on the notification list via IBM Tivoli Enterprise Console e-mail and paging facilities. Important: The actual evaluation takes place automatically when the IBM Tivoli Service Level Advisor ETL completes its operation of moving the most recent measurement data from the data warehouse into the SLM measurement data mart. However, IBM Tivoli Service Level Advisor also enables additional advanced settings for intermediate evaluations, frequency of trend analysis, and logging messages for missing data.3.8.6 Problem and change management Tivoli Business Systems Manager provides an integration function to create and track problem tickets. This includes opening and maintaining problem tickets that are stored and processed within a problem management application and automatically creating problem tickets when certain types of messages or exceptions are generated. Another area of integration is creating and tracking change requests. The Tivoli Business Systems Manager integration function is implemented using request processors. A request processor is any program or script that can process command line input parameters, read a text-based input file containing data passed from the Tivoli Business Systems Manager integration function, and create a text-based output file with the results received from the problem or change management system integrated with Tivoli Business Systems Manager. The following types of request processors can be used: Problem request processor: This is any request processor that implements interfaces for entering data and generating requests to create, query, search, find, retrieve, and update problem tickets. The Tivoli Business Systems Manager problem integration function displays the menu options for the BSM Chapter 3. IBM Tivoli products that assist in service level management 107
  • 125. problem ticket processing. Then it transfers control to the user-written program for integration with user’s problem management application. Change request processor: This implements interfaces for entering data and generating requests to create, query, search, find, retrieve, and update change requests. The Tivoli Business Systems Manager change integration function displays the menu options for the Tivoli Business Systems Manager change request processing. Then it transfers control to the user-written program for integration with user’s change management application. Automatic ticket request processor: This is any request processor written by users that can process command line input parameters, read a text-based input file containing the data passed from the Tivoli Business Systems Manager automatic ticket integration function, and create a text-based output file to contain problem ID returned from the problem management application. The automatic ticket integration function differs from the problem and change integration functions within the Tivoli Business Systems Manager product. It does not have a console interface. Its sole function is to create problem tickets and optionally generate automatic notifications by pager or e-mail. The automatic ticket integration function interacts with a user’s request processor when message or exception events are sent to Tivoli Business Systems Manager. All events are processed by the automatic ticket integration function based on predefined automatic ticket event rules that provide criteria for passing the matched events to the request processor. When Tivoli Business Systems Manager console is set up to work with problem and change managements systems, the user can perform the following tasks: Create, find, update, and close problem tickets Two types of create are supported (from the context menu of a resource and from an ownership note) Create, find, update, and close change requests Important: Tivoli Business Systems Manager provides integration functions and request processors for problem, change, and automatic ticketing. Users must develop their own customized programs that can interface their change and problem management systems. Most problem and change management applications provide some type of APIs. After a Tivoli Business Systems Manager request is processed, interface programs must return control to the Tivoli Business Systems Manager exit point and provide notification of results.108 Service Level Management
  • 126. 4 Chapter 4. Planning to implement service level management using Tivoli products The starting point for this chapter is that a decision has been made to implement service level management (SLM) in accordance with IT Infrastructure Library (ITIL) recommendations. Also IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor are used as key parts of the overall solution. The chapter was written from the perspective of an IT consultant assigned to plan and implement a solution. It covers the following topics: An overview of the SLM process introduced in Chapter 2, “General approach for implementing service level management” on page 23, with each stage described in the context of IBM Tivoli products In-depth technical overview of the IBM Tivoli products that are used for SLM In-depth technical description of selected new features of IBM Tivoli Business Systems Manager V3.1 and IBM Tivoli Service Level Advisor V2.1 that are exploited for SLM Brief overview of additional IBM Tivoli products that are used for SLM© Copyright IBM Corp. 2004. All rights reserved. 109
  • 127. 4.1 Implementing SLM using Tivoli products This section reviews the stages of implementing SLM described in Chapter 2, “General approach for implementing service level management” on page 23. It describes each stage in the context of using the IBM Tivoli products introduced in Chapter 3, “IBM Tivoli products that assist in service level management” on page 53. It explains briefly how IBM Tivoli products contribute to each stage of the SLM implementation process. Figure 4-1 illustrates the planning, implementation, on-going SLM program, and improvement process stages.Planning Implementation Established decision to implement SLM Develop service level objectives - Describe services - Determine service level indicators - Determine metrics to be used Define key players: Negotiate on service level agreements - Project Sponsor - Review SLOs with business owners - Service Level Manager - Agree on metrics to be used - Project Manager - Agree on reporting requirements - Business Representatives - IT Representatives Implement SLM management tools - Implementing additional monitoring capabilities - Enhance existing monitoring tools if required - Integrate data collected by monitoring - Implement Business Service management tools Understand the services: - Automate service management - Define services - Establish initial perception of the services - Define expected quality of services Establish reporting function - Periodicity - Recipients - Formats Assess ability to deliver: - Analyze existing infrastructure Adjust IT processes to include SLM - Verify existing monitoring capabilities - Service Support processes - Establish baseline for measurement - Service Delivery processes Improvement Process On Going SLM program Improving quality of service levels Maintenance of services definitions Improving efficiency of SLM SLA management via historical reporting Improving effectiveness of SLM Priority management of real-time faultsFigure 4-1 SLM processes implementation approach110 Service Level Management
  • 128. 4.1.1 Planning During the planning stage, you should become familiar with the capabilities and features of the IBM Tivoli products that are available to you. You must also become familiar with any new products and revise perceptions of existing and installed products. What may now be an under-used event monitor may well become a key tool in SLM. This idea is explored further in “Understanding the services” on page 111 and “Implementing additional monitoring” on page 113. Defining the key players Establish the providers and customers of SLM. Establish who will use SLM tools and their roles. When the users and roles are established, map them to the users and roles provided in IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor. The IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor user roles are described further in 4.2.6, “IBM Tivoli Business Systems Manager roles in an SLM context” on page 132. Practical application of these roles is detailed in the Part 2, “Case study scenarios” on page 195. Understanding the services Understanding the services is a key part of SLM implementation. It is also particularly important to the IBM Tivoli Business Systems Manager implementation. See Chapter 2, “General approach for implementing service level management” on page 23, “Business process-based IBM Tivoli Business Systems Manager business systems” on page 122, and “Data gathering and business system decomposition” on page 134. Assessing the ability to deliver It is important to analyze the infrastructure to assess its capability for providing the services defined in the previous steps. It is also important to know the kind of applications that can monitor various variables of that infrastructure. Refer to Chapter 3, “IBM Tivoli products that assist in service level management” on page 53, for a brief description about some of the Tivoli monitoring applications that are available. At this point, you can define a initial target for the level of service. For example, a service level agreement (SLA) for service A states that it has to be available for 99% of the time with a reporting period of one month. Review this initial target regularly because working toward an obviously unreachable target is unrewarding. You can use IBM Tivoli Service Level Advisor to gather basic metrics for this service. As new feeds and processes are introduced, you can change the SLA to suit the organization’s ability to deliver. Chapter 4. Planning to implement service level management using Tivoli products 111
  • 129. 4.1.2 Implementation The implementation phase is when you install new Tivoli products and review existing Tivoli and other systems management products for SLM. Developing service level objectives After you understand the services, you can begin to define service level objectives (SLOs) for them. You define the SLOs in terms of the information available from the infrastructure. This means that you must base the objectives on what can be measured by the tools that are available. For this reason, review SLO definitions as new monitors are introduced. A new monitor can bring in new metrics that enable a different measurement of a service to be taken. Therefore, we recommend that you review the SLOs. You can different types of metrics: external and internal. When developing SLOs, it is important to differentiate between internal and external metrics. External metrics are defined in the SLA contract. They are visible to the customer. An example of an external metric is Overall Response Time of Service. Internal metrics are accessory metrics from system monitors that can be used by the service provider in a proactive manner to ensure that the contract is being met. Internal metrics are not shown to the customer and are not part of the SLA contract. An example of an internal metric is Response time of DB2 Databases used by the Application. Negotiate on service level agreements After you develop the SLOs, negotiate the SLA. As in any negotiation, it is important that you have all the information available for this important step. The most important information is the current level of the service based on the metrics that were chosen in the previous step. You obtain this information by evaluating the historical data. Assuming that the monitor applications have been collecting information from the infrastructure for some time, you can use the IBM Tivoli Service Level Advisor function to retrospectively see how you are doing.To see how to implement this, refer to 4.4.1, “Building SLAs in IBM Tivoli Service Level Advisor” on page 156. After the negotiation, you may want review and adjust the SLA that was created.112 Service Level Management
  • 130. Implementing additional monitoringThis is an extremely critical stage and prerequisite for SLM. It covers thefollowing tasks: Increase the rollout of existing systems management tools to cover gaps in monitoring. The business process decomposition may reveal gaps in monitoring. Ensure whether these can be filled by your existing systems management tools. Re-assess, re-invent and exploit existing systems management solutions to cover gaps in monitoring. This is an extension of the previous task. Most systems management tools have features and functions that are not exploited. Re-assess all the existing systems management tools to see if further exploitation can be done to cover the monitoring gaps. Review and re-engineer existing systems management solutions to ensure event quality. IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor can only be as good as the information that is sent to them. If every event, trivial or critical, sent by the monitors is marked as critical, then there is no way to truly assess the business impact of the events. Every business system is marked as critical, and the management of the business processes will be essentially blind. It is imperative that events sent from the monitors reflect the true severity of the event on the component, conform to message ID standards and, ideally, have a corresponding goodness event to close the original event if the bad situation no longer applies. It is often substantial work to standardize events, but it is a necessary work if SLM is to be successful. Implement new IBM Tivoli Monitoring products to cover gaps in monitoring. Some of the monitoring gaps may not be covered by the existing systems management skills or products. Use IBM Tivoli Monitoring products to cover the remaining gaps. Examples are: – IBM Tivoli Monitoring – IBM Tivoli Monitoring for Database – IBM Tivoli Monitoring for Business Integration – IBM Tivoli Monitoring for Web Infrastructure These products measure the internal performance of systems and applications. The functionality includes continuous monitoring and recording of information, raising alerts when thresholds are exceeded, and gauging user experience by making response time measurements. These products can monitor hardware databases and applications.Chapter 4. Planning to implement service level management using Tivoli products 113
  • 131. Implement IBM Tivoli Monitoring for Transaction Performance to provide user-experience monitoring. User experience monitoring is key to providing an end-to-end view of a service. Implementing and exploiting IBM Tivoli Monitoring for Transaction Performance is explained in 4.5.1, “IBM Tivoli Monitoring for Transaction Performance” on page 190, and in Part 2, “Case study scenarios” on page 195. Implementing SLM analytical and automation tools This is the actual implementation stage of IBM Tivoli Business Systems Manager and IBM Tivoli Service Level Advisor. In this stage, you also implement any required supporting tools such as Tivoli Data Warehouse and IBM Tivoli Enterprise Console (TEC). Details of implementation are covered in Part 2, “Case study scenarios” on page 195. Establishing a reporting function Reports in this solution are on demand. You can request them to see the status of the services at any point in the evaluation period. The main task here is to define the various users and the access they have to the information in the solution. For details about how to do this, see “Reports” on page 164. After you create the users, check the available IBM Tivoli Service Level Advisor reports to ensure that the users can see what they need to see. For examples of the views that are available to the various users and roles, see Part 2, “Case study scenarios” on page 195. Adjusting IT processes to include SLM Sometimes it is necessary to revise operational processes and practices to ensure that SLM data is accurate. An example of this is to ensure that the state of the system or application is not considered during maintenance period because it may affect its over all availability. Another example is to revise the change process as required. This ensures that the SLM tools are included in the scope of changes so that business systems and SLAs can be changed accordingly.4.1.3 Ongoing SLM program This task covers continuous monitoring, reporting, and reviewing of the SLAs. The main idea here is to be proactive and identify possible problems in the infrastructure before they impact the SLA at the end of the evaluation period.114 Service Level Management
  • 132. Many IBM Tivoli Service Level Advisor capabilities can be used for this. Trends toward violations IBM Tivoli Service Level Advisor calculates trending toward violations for any metric selected to be part of an SLA. It analyzes the data for the metric and sends a trend event when the algorithm detects that the data shows a linear or stress exponential trend that may violate within a predetermined interval. See Chapter 5, “Case study scenario: IRBTrade Company” on page 197, for an example. Intermediate evaluations These evaluations are done more frequently than the report one. A common situation is a monthly evaluation and a daily intermediate evaluation. With this, the IT organization can check everyday on the status of the various services it is providing and take action while it is possible to affect the SLA at the end of the month. For details about this function, refer to Part 2, “Case study scenarios” on page 195. Adjudication In some situations, some violations will happen in conditions that, according to the SLA contract, can be adjudicated. An example of this is when the number of users, who are using a certain application, exceeds what was in the contract, so the violation for the month can be adjudicated. Refer to “Adjudication” on page 170 for details.4.1.4 Improvement process SLM is a continuous process, and improvement opportunities do not end. Reviewing service requirements changes As mentioned earlier, it is important that changes to the environment are reflected in the SLM tools. You can use IBM Tivoli Business Systems Manager to enhance change requests and should be closely involved in planning service changes. By using the Business Impact view on an object within IBM Tivoli Business Systems Manager, it is possible to see every business process that can be affected by the change and manage the change accordingly. Changes to services that require new components to be added should ensure that the new components are added to the IBM Tivoli Business Systems Manager business system before or when the change becomes active. If a new component is added before it becomes live, use the IBM Tivoli Business Systems Manager Maintenance function to suppress event propagation from the object while it is in test. This function is described in IBM Tivoli Business Systems Manager Administrator’s Guide, SC32-9085. Chapter 4. Planning to implement service level management using Tivoli products 115
  • 133. Decommissioning resources is not reflected in IBM Tivoli Business Systems Manager. A decommissioned object remains in the business system and no longer receives events. These decommissioned objects from business views have no effect on continued IBM Tivoli Business Systems Manager function. They can be cleaned up as a maintenance function to avoid having too many decommissioned objects. You can use Automatic Business Systems (ABS) and Extensible Markup Language (XML) Business System building to ensure that changes to the service are reflected in IBM Tivoli Business Systems Manager. Failure to reflect service