Shows how a major Financial Services company uses the ITIL Service Management framework to ensure its Windows platform delivers a quality service to its end users in a cost effective manner.
It will demonstrate how this is achieved through a series of regular monthly and daily reports that have been designed for various levels of management, including IT and Business managers. The reports are generated using the RG Solutions product supplied by Computer Performance International Ltd.
The discussion will include a walkthrough of the five ITIL Service Management categories of its reports, the benefits these bring, and the results that have been achieved.
22. Online Failover Risk
Monthly Report Service Continuity
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Core Systems ExchCAS ExchMB SQL01 Treasury Datawarehouse
Online Capacity
(Please reter to the Definitions section for online service times)
32 CPU's 32 CPU's8 CPU's8 CPU's 32 CPU's20 CPU's
23. Cluster Failover Risk
Cluster Risk Potential Duration of Service Impact
Core Systems
ExchCAS
ExchMB
Monthly Report Service Continuity
24. Service Availability
System Name Availability % Server Downtime Timings
Core System 01 99.77 The system was taken down on Thursday 19th May at
about 12:30. From 13:00 to 13:10 the system was up
for 4 minutes and was then taken down again. As of
about 13:07, the system was back up for the rest of the
day.
SQL01 99.94 The system was taken down on Wednesday 4th May at
19:10. The system was back up at about 19:19.
SQL02 99.77 The system was taken down on Thursday 19th May at
about 11:10. From 11:40 to 11:50 the system was up
for about 5 minutes and was then taken down again. As
of about 11:55, the system was back up for the rest of
the day.
System Name Online Availability %
SBS Online Service 100
Monthly Report Availability
95 96 97 98 99 100
Core Online Service
Core System 01
Core System 02
ExchCAS0
ExchCAS1
ExchMB0
ExchMB1
SQL01
SQL02
Treasury01
Treasury02
Datawarehouse01
Datawarehouse02
Service and Server Online Availability
29. Benefits
• ITIL advantages
• Stability, high utilisation
• Reassures board and auditors
• Early warning on issues
• Staff training
30. Working with Supplier
• Trusted partner/flexibility
• 25 year relationship
• Historical focus on client requirements
• Expertise with no dedicated resource
• Beware complacency
• Dependency risk
• Costs/budgets
31. What did we learn?
• ITIL framework works well
• Report to the Business, not IT
• Check accuracy of management reports
• Ensure continuous improvement
• Open door on new reports
33. ITSMF UK
Premier Gate, Easthampstead Road, Bracknell,
RG12 1JS, United Kingdom
Tel: +44 (0) 118 918 6500 | Web: www.itsmf.co.uk
Editor's Notes
ITIL in action at Skipton Building Society
My name is Chris Brown and I work for Computer Performance International, that’s CPI for short. I came along to the ITSM conference for the first time last year and liked what I heard and the people I met. Listening to the presentations, I thought, wouldn’t it be good to talk about our experiences with Service Delivery Reporting at the Skipton Building Society. So, at our next quarterly meeting at Skipton, I asked my opposite number, Colin, who is the Service Operations Manager at Skipton, if he would like to do a joint presentation. I am pleased to find that Colin was as keen to do this as I was and here we are.
My name is Colin McMahon and I have been working in IT Service Management roles for over 30 years at Skipton. I have an ITIL v2 Manager qualification, bridged to ITIL v3 Expert. We have contracted CPI to provide Capacity Management software and services for 25 years. Back in 2006, we asked CPI to structure our reporting in line with the ITIL standard, which has recently helped us with our latest reporting requirements. Preparing the presentation has been an experience for both of us with some surprises and we hope you will find this as interesting as we did.
Overview – Chris
Ensure Skipton provides Quality, Cost-Effective Service
How we have Approached This
A detailed walk through of how this works
Pros and Cons of this approach
What we have learned
IT Configuration - Colin
2 Physical Sites Active/Active
63 Physical Cisco UCS blade Servers
24 ESX hosts + Windows Native
543 virtual servers – VMWare 5.5 + 6
Windows Server 2008 and 2012
EMC Vmax arrays 300TB raw at each site, sync replication
0 RPO
Local and geographical windows clustering
Site Recovery Manager + Vmotion for VMWare
What CPI Provides - Colin
Monthly & Daily Reports
Windows & Vmware Hypervisor, focus on Windows
Use ITIL2 Service Delivery Framework from Skipton
Skipton SM aligned with ITIL principles in 2006/7
Tailored to Skipton Requirements.
Aids dialog and provides MI to Development, Operations, DBA, Audit & IT Risk
Online Browse for Detailed Analysis.
Capacity Planning Expertise & Ad Hoc Reports (before and after move to UCS)
RG Solutions – Chris
RG Solutions product accumulates long term data
Standard interfaces – Windows Objects & Counters
Data from logs for application & transactions.
Daily downloaded & import
Generates Service Management reports
Browse using Explorer style interface
How this is delivered - Colin
The monthly report – the key report – issues highlighted , management summary
Daily Report supplements monthly – short term trends – 14 days rolling view
Management report for monthly Capacity Management meeting delivered to quarterly Capacity Planning meeting
Resiliency Report is for Key Risk (DR) monthly report – used as part of monthly Board reporting process.
Quarterly meetings – business update; report enhancements; process review; training; continuous improvement enabler
Monthly Report Sections - Chris
Initially Sceptical, Technical, Capacity
The monthly report follows the ITIL2 Service Delivery structure.
Service Levels
Business Management
Capacity Management
Service Continuity
Availability
The Monthly Report - Colin
Received in the first week of the month covering the previous month.
Circulated to Management & Technical teams
Graphics avoid jargon, easy to read
An aid for IT dialogue with the Business.
Pdf format for printing/mobile device viewing
Target is 30 minutes to digest
Important Notes linked report
Allows Quick Review - Initial Analysis completed by CPI
Service Level Management - Colin
This is the Service Level Management section of the Monthly Report
Skipton is an online business, focus is response times
Shows % of transaction response time < 1 second duration over the last 12 months.
Business / logical transactions as captured in the application code
Used by IT management
Monitor the trend of response times
Give a flavour of colleague experience using the services
Highlighting issues.
Consistency of response time from month to month important.
See improvement after hardware refresh in January.90-91 to 95-96%
Batch Turnaround – Colin
Batch is – update of systems
Designed and allocated to run in a certain order to produce business reports, apply interest to accounts, extract files, letters and statements and cheques for customers.
Reports split into parallel flows, shortest run time, no impact on next online day
Critical Overnight batch turnaround for each day of month as % of Window
Constrained by increasing online availability requirement 22:00 and 08:00
Some online channels are coded to work during batch , some not, branches and call centre not.
Blanks are delays or backups at start
Month End Turnaround – Chris
Financial Systems monthly cycle, heavy month end batch.
6 month trend from 20:00 to 08:00.
Daily batch followed by month end.
The gap is to move data to data warehouse systems.
ME – split into inquiry and updates
Inquiry runs against a separate database for parallel
Update used to produce statements, investment account year ends etc.
Mid-week month-ends critical, 2 hours early at 20:00.
Transaction Count Annual Trend – Chris
We now move to Business Management section.
Business growth transaction count rolling 13 months for year on year comparisons
Shows growth trend and comparison with last year.
Used by IT management and Business managers in monthly MI packs produced by IT.
This chart does tend to follow business
April is ISA year end spike – ISA’s arrive from other providers, also transferred out plus new account opened
Transaction Count by Application – Chris
Breakdown of major transaction types
Online Mortgages
Online Savings
Core Systems – Central backend system, referred to as Enterprise by the business
Background Worker – initiated from the online system to complete background tasks such as letter production.
Transaction Count by Application Trend - Colin
3 month by week trend for each application.
1 chart for each of 4 types.
3 month by week trend for each application; 1 chart for each of 4 types.
Used by IT Managers
Background Worker
Time neutral policy for developers making changes to batch
Developers identified tasks that would normally be run using additional batch reports, setup of workflow tasks.
Coded to kick in at specific times or specific events e.g. Rate changes- first one for over 7 years
Used the background workers of the online system
Produced additional transaction logs that took extended time to replay into the data warehouse
Whilst issue was identified , CPI report contributed to Problem Management investigation and feedback to the developers
Daily CPU Usage Trend – Chris
These are thumbnail charts for repetitive at-a-glance reports.
CPU daily online % usage, rolling 3 months, one for each server.
Identify weekdays, Sat, Sun, Monday peak.
These show any significant changes and short term trends.
Used by Server Support team to escalate to Management team.
Story with svchost problem last year?
These are Average not Peak, not for Capacity Planning purposes
Online Capacity – Chris
This is Capacity Management Section
This is a key chart -how full are servers during online.
Used by Capacity Planning team
Calculate Capacity each hour of month out of 80% max
We take Peak online & 95th %ile
Chart Grouped by failover Cluster for 12 key servers
Use 95th %ile as Capacity for Planning
Peak vs %ile shows stability
Online Capacity History – Colin
Trend of Online capacity trend, 1 chart for each server
Out of Capacity in December 2015
January hardware refresh, move to UCS servers
Batch Capacity - Chris
Similar to chart in Service Management section.
Used by Capacity Planning
Batch end time for each day of month.
Elapsed time during10 hour overnight for critical batch run.
Includes contingency time of 2 hours.
Critical Batch Flow - Chris
Aim is to run as many processes in parallel.
Used by Operations Scheduling/Developers
Visualise Batch flow
Covers 21:00 – 02:00 with list of batch processes
This is Critical Batch flow for longest run
Highlights Gaps & Candidates for improvement
Presentation identified new report required
Disk Occupancy Volatility – Colin
Identifies disks that becoming full on month by month basis.
Shows range of disk occupancy across the month, volatility.
Growth, green line indicates the start of the month, dark blue line end of month.
One chart for each server
Used by Database team and Infrastructure team.
Anything more than 90% at end of month highlighted in Important Notes
Memory Usage - Chris
Used by Database & Infrastructure teams
Thumbnail charts, 1 for each server, daily peak memory usage for rolling 3 months.
Shows memory trends in case these become critical.
Can change dramatically, showing failover or memory issues.
This example shows a cluster failover at the end of September.
Cluster Failover Risk – Chris
This is Service Continuity Section
Used by IT Management
Highlights Failover Risk
Combined Capacity for Online Failover
Peak Hour, out of 80%
ExchMB critical, Treasury warning
Validated by ad-hoc reports during failovers
Cluster Failover Risk – Colin
Used by technical teams, visual indication to IT management.
Green is less than 80%, Yellow less than 100% and Red over 100%.
Quantifies risk (number of hours). Visual indication to IT management.
Green < 80%, Yellow < 100% and Red > 100%.
Quantifies risk (number of hours).
Exchange Action Plan
Based on 80% usage
Increased VCPUs from 4 to 8
Increased memory
Updated Drivers
Planning move to dedicated Datastores
Service Availability – Chris
Finally, the Availability Section
Service Availability now major focus, direct customer impact
Used by IT Management
This chart shows both Service and Server Availability.
Blue lines are Windows Availability
Service availability is application uptime and active transactions.
Shows 100% online Service availability with failed over server.
This illustrates the effectiveness of the cluster failover Disaster Recovery strategy.
That completes the 5 sections of the Monthly report.
Daily Report - Chris
Used by Operations Support team, displayed on wall board.
At-a-glance to see is everything was OK yesterday.
Rolling 14 days shows trends including weekends.
Thumbnail for each server, processor and memory.
See Core System, Monday exception.
Use online browse to investigate
Management Report - Colin
Management report for monthly Capacity Management Meeting with Service SME’s
Used for key risk reporting commentary
Feeds into quarterly Capacity Management Meeting with senior IT managers including those responsible for Capacity Planning.
At server cluster level and key individual servers, rolling 6 months
Is any resource critical in need of expenditure?
Used to justify Core Server
Resiliency Report – Colin
Introduced recently as part of specific Key Risk Indicators for Skipton’s senior business management.
Resiliency monthly management report checks critical resources for DR failover
4 traffic lights to reflect KRI Board reporting, based on 80% and combined
Quarterly Meeting - Chris
Essential for Continuous Improvement
Additions/changes to reports.
Outstanding issues future plans, upgrades
Examples: Disk Volatility, Resiliency
Can be difficult to schedule
Benefits - Colin
ITIL framework – Proved its case – recent Audit and Risk
Comprehensive, Well Defined, Industry Standard
The target is stable, high utilisation within capacities/targets.
Reassures auditors, Skipton’s board they are getting good value for money for equipment with low risk.
Highlighted Core System needed upgrade
Highlighted Exchange under-configurations
Monthly reports useful for New Staff Training
Working with Supplier – Colin
Trusted Partner – show Flexibility on requests and requirements
25 years relationship
Historical focus on Skipton requirements
Expertise, including experience of best practice from other financial institutions
Beware Complacency – focus and services change over time
Make sure its widely read and get feedback on improvements or relevance
No dedicated resource required but creates a dependency risk
Cost associated with service; challenged during budget planning
What did we learn - Chris
ITIL Framework works for Skipton
Address reporting to the Business, not IT
Always check Management Reports
Regular meetings ensure Continuous Improvement
If you hear of a requirement, include a report
Check it works for the requestor
33. Any Questions?
Thank you for Listening
In preparing presentation:
Colin: Have revaluated reporting including contents, who reads it and defined current benefits to business.
Chris: Have improved our perception of Skipton’s requirements and have found new ways to improve our services
Do you have any questions?