5. IT 运行维护管理困难与挑战
• 最终客户的满意度低 • 服务的特点:
– IT 以技术而不是以服务为中 – 无形
心 – 感知
– 满意度评测不科学,对 IT – 不可存储
服务工作帮助小 – 强调过程
– 互动
• 以被动为主而不是以主动为主
– 成本
– 靠 “ 英雄 ” 而不是靠 “ 流程 ” 来 – 质量管控难
解决问题
– 尽力而为 vs S L A
思考:
•如何让 IT 部门的维护力量得到支撑和加强
•如何提高最终客户对 IT 运营管理服务的满意度
•如何降低在 IT 基础设施方面的运营成本
5
6. IT 运行维护管理困难与挑战
从 IT 运维的现状来分析
训练不足
未测试的变更 备份错误
安全疏忽
高负荷
因?
问题管理弱
40 % 40 %
原
障的
流程 人员
失误
故 疏失
统 出现 20 %
T系
平台
I 故障
硬件、软件、
网络、电力失
常及天灾
Source: Gartner Security Conference presentation “Operation Zero Downtime”, D. Scott, May 2002
6
7. 全球 IT 运行管理发展趋势
IT 系统生命周期
主动 / 自动服务管
理
Self-Managing
效率 (Efficiencies)
Infrastructure
集成化运营中心
Service-Centric
Support
热线服务
Help Desk
救火队
IT 部门面向最终客户的服
Fire Fight 务
演变历程
时间 (Time)
7
8. 全球 IT 服务发展趋势
分别采购
路由器、交换机、桌面 PC 、办 整体采购、一揽子服务
公软件、 E RP /C RM/S C M 应用系
统。。。
IT 部门-直接提供服务 IT 部门- IT 服务管控者
IT 部门负责 IT 系统的运行维护 IT 部门负责整合内外部资源,对
,直接响应最终客户的服务请求 各个服务提供商的质量进行管控
可预测的成本带来可预测的质量
“ 黑箱子 ” — — 服务过程不透明 服务透明化
最终客户不知道 IT 部门的工作 最终客户知道 IT 部门或者外包
状态和进展情况 服务工程师的工作状态和进展情况
透明展示工具:
We b 、 E ma il 、短信、电话等多
种方式
8
9. ITIL 对企业的价值
• ITIL /ITS M 是一套公开的、基于业界最佳
实践制定,用于规范 IT 服务管理的流程和
方法论。 IT 与业务
的整合
• ITIL ( IT Infra s truc ture L ib ra ry )以流
程为导向,以客户为中心,通过整合 IT 服
务与企业业务,提高了企业 IT 服务提供与
运营管理的水平。 IT 服务管
理目标
• 1989 年最初由英国商务部 O G C ( O ffic e
o f G o ve rnme nt C o mme rc e ) 组织开发 提高 IT 降低 IT 服
、出版 — — 十大流程( 1. 0 版 服务质量 务成本
本)。 2001 年, 整合、增加为 6 个模块
,构成了 ITIL 2. 0 版本。 ITIL 能对解决企业解决信息化
• 目前已经成为一套事实上标准,全球有至 发展中遇到的问题提供帮助
少 20, 000 多家在各自行业领先的组织都
依据 ITIL 的框架来提升 IT 服务的效率及 9
10. 参考模型 – ITIL2.0
服务管理规划与实施
业 服务管理 技
服务支持
业务 ICT
Service 基础设施
视角
Support 管理
服务提供
Service
务 术
Delivery 安全管理
应用管理
10
11. ITIL 核心—服务支持、服务提
供
发布管理 服务支持
IT Customer Service Support
Relationship 变更管理 问题管理
Management
配置管理 事件(故障)管理
容量(能力)管理
服务水平管理
Service
Desk
IT 服务财务管理 IT 服务连续性管理 服务台
服务提供 可用性管理
Service Delivery
安全管理
11
13. ITIL 帮助企业建立的流程举例
• 一个用户打电话到 S E RVIC E D E S K
• INC ID E NT MA NA G E ME NT 处理涉及到的故障
• P RO B L E M MA NA G E ME NT 调查深层的原因
• C H A NG E MA NA G E ME NT 提交 RF C 处理变更
• F INA NC IA L MA NA G E ME NT 帮助衡量其中的费用
• IT S E RVIC E C O NTINU ITY 考虑对恢复计划的影响
• RE L E A S E MA NA G E ME NT 监督软件的分发
• A VA IL A B IL ITY MA NA G E ME NT 评估性能 / 操作
• C A P A C ITY MA NA G E ME NT 确保所需资源可用
• C O NF IG U RA TIO N MA NA G E ME NT 在 C MD B 中记录所有的
事件
• S E RVIC E L E VE L MA NA G E ME NT 确保用户的需求已经满足
13
26. 服务台
Service Desk
工作目标:
• 提供与客户的单点联系 思考:服务台如何支持 S L A 目
标的完成?
• 促进日常服务的恢复(作为突发时间的跟踪者
)
• 生成报告,沟通和推广服务 举例:向最终客户承诺了每月网
络 L ink D o wn 3 小时以内,如
• 为组织增加价值
果突发一个 L ink D o wn 故障,
活动:
服务台如何定义该故障级别?
• 为 IT 客户提供建议和指导
• 为 IT 客户快速恢复日常服务
可能 1 :当月未发生过 L ink
• 从被动支持到更多主动服务
D o wn
• 监控并支持 S L A 服务目标的完成
可能 2 :当月已发生 L ink D o wn
• 沟通和推广服务 2. 5 小时
• 产生和报告 IT 管理信息
26
27. 服务台
Service Desk
服务台的职责:
思考:
• 接听客户呼叫电话,提供一线支持
• 记录,设定优先级并跟踪突发事件
1 、呼叫中心、帮助台( H E L P
• 使客户了解服务请求处理进展状态 D E S K )和服务台的关系
• 升级服务请求 2 、服务台跟踪突发事件处理过
• 协调二线 / 三线支持团队 程的作用和意义
• 向客户确认并终止事件
3 、如何通过电子化手段与最终
• 生成报告 — — 重点:
客户建立个性化的联系,建
– 发生的主要事件、问题、变更,已经
与之相关的应急措施 立个性化档案(特别适用于
– 客户不满意事件 桌面 P C 服务)
运行不良的 IT 设施
–
4 、如果网管监控人员发现故障
– 下周计划的变更
,应该首先做什么?
27
28. 如何评估服务台的工作
服务台服务质量参数(举
Plan Project Plan
例)
Organisation
Measures
WBS, Scheduling
...
Do Implement it
Check Audit
让我们衡量 ServiceDesk...
Act New Action / Improve 定量:
• 首次呼叫解决率 [%]
• 所有问题解决率 [%]
• 客户满意度 [rating range]
衡量 • 升级到二线的问题 [%]
(Measures) •…
定性:
•完备的审计功能使责任到人 , 摒弃了工作中的推诿
•问题自动升级上报机制大大提高整体服务水平等级
•知识共享减少了大量的重复工作
28
36. 事件管理中可能发生的问题
a 、用户绕过事件管理的程序
• 如果用户没接收该流程的培训,他们可能不遵循正确的程序,而是自己
试图去解决错误。结果,事件记录不能准确的更新。
b 、事件超负荷
• 出现以下情况,可能超负荷
• - 没有对事件进行清楚归类。
• - 没有正确分配和转送事件。
c 、事件升级( e s c a la tio n )的增加
• 如果支持团队不具备适当的技能和资源,事件可能迅速升级到更高级别
的支持团队。这将给专家支持团队增加不必要的工作量。
d 、服务级别协议没有清楚定义
• 如果事件管理流程支持的服务没有在服务级别协议 (S LAs ) 中清楚地定
义,事件管理人员不知道哪些报告的错误和需求可以作为事件。
e 、组织中文化的变更
• 组织中实施事件管理流程需要以流程为导向。结果,更多的任务、责任
和更严格的纪律要求可能超出员工的预料,一些人就会产生抵触情绪。
36
45. 变更管理
Change Management
目标 • 确保通过标准化的手段和流程有效的控制和处理所
有变更,以最小风险、高效的、高费效比的来实施
被批准的变更
• 变更( C ha ng e ) – 导致一项或多项 IT 基础架构
定义 C I 状态变化的一个行动。
• 标准变更 ( 事先经过审批 )
• 变更请求 ( RF C )
• 变更下一步日程安排 ( F S C )
• 变更顾问委员会 ( C A B )
• 受理、记录、批准、计划、测试、实施并回顾审视
任务 变更请求
• 提供 IT 基础设施的变更报告
• 驱动 C MD B 的修改
思考:何时实施变更?为什么? 以光缆割接为例。
要评估能否在非服务时间实施变更;能否同时实施多个变更,以降低对客户 /
业务的影响
45
46. 配置管理
Configuration Management
• 识别和定义配置项 Id e ntifying a nd d e fining C o nfig ura tio n
Ite ms ( C I)
• 规划、定义与管理配置管理数据库 P la nning , d e s ig n &
ma na g e me nt o f C o nfig ura tio n Ma na g e me nt D a ta b a s e
( C MD B )
• 定期验证 C MD B 的准确性和完整性 Re g ula r ve rific a tio n o f
C MD B a c c ura c y
• IT 资产的详细报告 D e ta ile d re p o rting o f a s s e ts
46
47. 硬件 CI 举例
MAINFRAME FILE SERVERS
NETWORK
关联关系
SCOPE
MODEM Is connected to
Is part of
HUB
属性
Owner, Status,
MODEM Location, Version,
Serial Number
PC PC PC PC PC
PC CI LEVEL
Keyboard CPU Mouse
47
48. 发布管理
Release Management
目标 • 发布管理全面考核对 IT 服务的变更,确保综合考虑
一项发布的各个方面,包括技术和非技术因素。
定义 • 发布
– 增量发布( D e lta Re le a s e )
– 完整发布( F ull Re le a s e )
– 包发布( P a c ka g e Re le a s e )
• 紧急发布( E me rg e nc y Re le a s e )
• 发布策略( Re le a s e P o lic y )
任务
• 发布规划
• 设计、开发和配置一项发布
• 发布审核
• 上线计划
• 交流、准备和培训
• 分发和安装
48
50. 财务 / 成本管理
Financial Management
• 制定与控制 IT 预算 C o ntro lle d IT b ud g e ts
• 对 IT 成本进行分类、核算、控制 C o s ts a re c a te g o riz e d ,
kno wn & und e r c o ntro l
• 服务的支付 C ha rg ing fo r s e rvic e s
• IT 财务报告 Re p o rts o n IT fina nc e
50
52. IT 服务连续性管理
IT Service Continuity
Management
目标 • 确保所需的 IT 技术和服务设施在规定时间内恢复
定义 • IT 服务持续性管理 ( ITS C M)
• 业务持续性管理 ( B C M)
• 危机
• IT 服务持续性规划是一套系统化的方法以创建计划、流程
(需要定期更新和测试),以预防、处理和恢复关键服务
。
关键点 • 对 IT 基础设施进行风险分析与管理
• 采取相应的对策减低危机的影响
• 开发、维护并定期测试连续性计划
• 重在预防
52
55. 可用性管理
Availability Management
• 业务与客户满意度的核心 • 可用性 Availab ility
关键点 • 分析与报告 IT 基础设施的性能 • 可靠性 Re liab ility
• 提供数据给服务级别管理流程 • 可维护性
• 根据可用性计划产生变更请求改善 Main tain ab ility
IT 基础设施性能从而降低风险 (内部)
• 可服务性
修复时间 S e rvic e ab ility
(外部)
响应时间 • 安全性 S e c u rity
恢复时间 ( 保密性、完整性
探测时间
和可用性)
复 恢复
突发事件 探 诊 恢 正常 突发事件
测 断 始
开
(Meantime) Time to Time Repair – MTTR (Meantime) Time Between Failures-
or Down time (MTBF) or Uptime
(Mean) Time Between System Incidents - MTBSI
55
57. 中国电信管理服务产品整体架
构
网
应用外包服务:商务领航
2006 网 络
网络、 P B X 、视讯、呼叫中心、
To ta l -提供全面的管理型业务 络 应
IT 设备租赁及外包服务
NW&S ys Ma inte na nc e -
应 用
提供监测、报警等帮助 用 安
型服务
网络管理专家服务 网络安全专家服务 管 全
Infra s truc ture -提供基础层
面的 ID C 、存储服务 理 管
灾难备份系统外包服务
D o It Yo urs e lf -提供 WA N 平 理
服务, C P E 设备由客户 平
自备
台
1994
存储服务 ID C 服务 台
依托中国电信强大的网络资源、运营经验和渠道优势,联合业界
领先的技术产品和服务提供商,提供模块化的管理服务产品,帮助客
户低成本地构建可靠的 IT 运行维护支撑体系。
57
我们看到,我国经过十年的信息化发展,很多行业很多单位都已经完成了一定程度的信息化工作,信息化发展已经进入一个比较高的阶段,例如很多金融企业的数据集中已经完成。在大规模的信息化建设完成后,马上就面临着长期的系统运行维护的问题。 在 IT 项目的生命周期中,大约 80 %的时间与 IT 项目的服务和运营有关。 从世界范围看, IT 部门面向内部的最终客户提供服务的历程可以分为四个阶段,一是救火队阶段,哪里出了问题赶快派人到哪里,有时候会忙得焦头烂额,响应也不及时,打电话找不到人,最终客户满意度不高;为了提高客户满意度,很多企业逐渐设置了专门的热线服务,受理故障申告后,及时安排工程师处理,跟踪故障处理过程及时反馈最终客户,客户满意度逐渐提高,这是第二个阶段;第三个阶段是集成的服务和运行中心阶段,其显著特征是在 HELP DESK 的基础上,进行了几个方面的优化和改进:一是在故障处理的基础上进行了问题管理,采取措施分析故障根源,解决故障隐患,二是服务过程流程化,三是增加了变更管理、资产配置管理、服务水平 SLA 管理等内容。同时,必须有一个良好的服务管理系统的支持,不仅仅是网管监测系统, 例如 PHILIPS 公司,他们的服务台 / 服务热线在接到客户的保障后,马上就知道当月已经发生的故障历时,和向内部最终客户的服务承诺相比较,确定这次故障的优先级别; 第四个阶段是主动维护管理和自动维护管理。 IT 系统能够自动报告, SLA 与其对业务的影响紧密相关。 目前,中国的很多企业还处于第一、二阶段,刻不容缓的工作是要尽快提升到一个更高的水平。 Today, we are going to start with this chart showing the evolution of Service Management over time. The characteristics of each maturity levels are as follows: Fire fight level: Is your help desk continually dodging bullets and fighting fires? Is “Joe” the help desk analyst frustrated and worn out from being the dart board for all IT operation problems? Characteristics: operator, labor driven, chaotic and dispatch focus React level: Your help desk tracks incidents till closure, but there is little automation, no integration with IT infrastructure tools and no process definition. Tickets are “logged and flogged”. Characteristics: Generalist, Single Point of Contact (SPOC), basic incident and problem handling; emphasis on break/fix, minimal measurement, rudimentary availability SLA, and lots of data without context Service centric support level: Your “Service Desk ” manages all service requests between the IT infrastructure and the service consumers (employees, customers and partners). Characteristics: Problem, change, and asset management, operational level agreements established, prioritize incident handling; plus root cause analysis; end user SLA for response and availability, leverage organizational knowledge for incident resolution, automated problem requests via integrations with NSM, Asset Mgmt, Software Delivery, etc. Self-managing infrastructure level: Applications, devices and systems automatically report problems to the Service Desk. SLAs are aligned with business objectives. Characteristics: L inkages between IT and Business; Availability and Performance directly linked to business priorities, Integrated Service Center and Operations; complex business intelligence drives business and IT planning While the market is driving towards the “Self-Managing Infrastructure” level, few customers are at that level today. Actually, many are somewhere between the “Fire Fight” and the “React” levels. Within this evolution process there is no wrong place to be. The primary goal is to progress from where you are to a higher evolution level so your company can achieve greater efficiencies while reducing costs.
ITIL 作为一种以流程为基础、以客户为导向的 IT 服务管理指导框架,它摆脱了传统 IT 管理以技术管理为焦点的弊端,实现了从技术管理到流程管理,再到服务管理的转化。 . 在 ITSM ( IT 服务管理)领域,受到广泛关注和欢迎的企业信息化 “ 最佳实践 ” ,是以 “ 流程 ” 为主线,以标准化为框架,以管理为核心的。 简单说一下 ITIL 的历史发展: 作为 2001 年由英国政府计算机和电信中心( CCTA )整合而来的英国商务部,从 20 世纪 80 年代开始就致力于研究和解决 “ IT 服务质量不佳 ” 的问题。 1989 年, CCTA 发布了一套 10 卷本的 IT 服务管理指南,这 10 本书系统地介绍了根据 “ 最佳实践 ” 归纳和总结的 10 大 IT 服务管理核心流程,这就是 ITIL1.0 版本。 2001 年, OGC 对 ITIL1.0 进行了修订和扩充,将原来的 10 本指南合编为《服务提供》和《服务支持》两本书(共同构成 ITIL6 大模块中的 “ 服务管理模块 ” )。此外,增加了应用管理、安全管理等其他 5 个模块。这 6 个模块构成了 ITIL2.0 版本。 20 世纪 90 年代后期, ITIL 的思想和方法,被美国、澳大利亚、南非等国家广泛引用,并进一步发展。 2001 年英国标准协会( British Standard Institute )在国际 IT 服务管理论坛( itSMF )年会上,正式发布了以 ITIL 为基础的英国国家标准 BS15000 。 2002 年, BS15000 为国际标准化组织( ISO )所接受,作为 IT 服务管理的国际标准的重要组成部分。目前, ITSM 领域正成为世界 IT 巨子、政府、企业和各界专家广泛参与的新兴领域,对未来的 IT 走向和企业信息化,将会产生深远的影响。 六大模块:业务管理、服务管理、 ICT 基础架构管理、 IT 服务管理规划与实施、应用管理和安全管理 服务管理模块是 ITIL 的核心,包括服务支持和服务提供两个子模块,细分为十个流程和一个服务台职能 最初是英国政府为了提高 IT 服务管理而开发 Office Of Government Commerce (OGC) Formerly Central Computer and Telecommunications Agency (CCTA)
包括 IT 服务提供( IT Service Delivery )和 IT 服务支持( IT Service Support )两大体系。
ITIL 的核心是服务管理模块,即服务支持和服务提供两个子模块中包括的十个典型服务管理流程和一个服务管理职能。 服务支持流程:主要面向用户( End-Users )。它用于确保用户得到适当的服务以支持组织的业务功能。服务支持流程包括,体现服务接触和沟通的服务台职能和 5 个运作层次的流程,即事件管理、问题管理、配置管理、变更管理和发布管理。这 5 个服务管理流程的主要职能是,确保 IT 服务提供方( IT Service Provider )所提供的服务质量,符合服务级别协议( SLA )的要求。 服务提供流程主要包括服务级别管理、 IT 服务财务管理、能力管理、 IT 服务持续性管理和可用性管理 5 个服务管理流程。由于这些管理流程必须解决 “ 客户需要什么 ” 、 “ 为满足客户需求需要哪些资源 ” 、 “ 这些资源的成本是多少 ” 、 “ 如何在服务成本和服务效益(达到的服务级别)之间选择恰当的平衡点 ” 等问题,因而服务提供所包括的这 5 个核心流程均属于战术层次的服务管理流程。
在这个金字塔中,越向上越和业务相关,向下,则是和 IT 细节相关,详细琐碎的 IT 细节。这就构成了 ITIL 和业务目标相互关联的全景的图画。
Include a brief discussion on the differences between Incidents and Problems and the virtual nature of the C.M.D.B
IT 战略决定 所需要的 IT 流程 流程决定 所需要的技术 所需要的人员及如何组织人员 技术决定 如何培训人员 对人员素质的要求 技术的革命会导致 IT 战略的变革。例如银行业务大集中,就是因为技术的发展而导致整个战略的变革。 技术的选择可以影响流程。选择先进的技术,可以优化流程,提高效率。 技术的应用促使人员工作效率的提高。
ITIL promotes: a customer culture a common terminology an ‘end to end’ view of IT services improvement in communication the breakdown of silo mentality a holistic view of service provision a recognised qualification system for individuals
ITIL is: process based impartial technology independent public domain common sense scalable regularly updated internationally relevant linked to a standard (BS15000)
Tools support the process – not the other way round Common tool for Incident, Problem & Change Management Automated workflows & escalation Links to configuration data Links to systems/network management tools Effective support for metrics Tools can greatly assist with efficiency
Results of Service Management Managing IT service is really about supporting the enabling technology to ensure: Customers understand and access support mechanisms IT is there when you need it Service is restored quickly Elimination of recurring problems You know when things are changing before they actually change
Unexpected changes that you find out about when they cause you grief Hard to find standards; not being able to find the documented standard Confusion on support; not knowing who supports what Multiple handoffs between service providers in response to break/fix Having to reassess and analyze information that someone else has already collected Missing measures and metrics that tell us how we are doing The lack of a repository where all service providers can access group knowledge and known errors
事件管理 所谓事件是指任何不符合标准操作、且已经引起或可能引起服务中断和服务质量下降的事件。事件管理的目的就是在出现事件时尽可能快地恢复服务的正常运作,避免其造成业务中断,以确保最佳的服务可用性级别。为了实现这个目的,事件管理流程必须最佳地利用资源支持业务、开发和维护有效的事件记录以及设计和应用统一的事件报告方法。 问题管理 问题是导致一些或多起事件的潜在原因,问题管理就是尽量减少服务基础架构、人为错误和外部事件等缺陷或过失对客户造成影响,并防止它们重复发生的过程。发生事件并不一定表明存在问题,问题也不一定要等发生事件后才能发现。 事件管理和问题管理的目标是相同的,但两者的侧重点不同。前者是强调 “ 尽快恢复服务 ” ,为此可以采取各种各样的措施,包括一些临时性的措施;而后者强调 “ 从根本上解决问题 ” ,即让事件不再发生,或者即使发生也有很好的应对措施。 变更管理 变更是指对 IT 基础架构组件(包括硬件、网络、软件、应用、环境、系统及相关文档)进行增加、修改或移除。变更管理的目的是使用标准方法和规程来快速有效地处理所有变更,以减少事件对服务的影响。 配置管理 配置管理是识别和确认系统的配置项,记录和报告配置项状态和变更请求,检验配置项的正确性和完整性等活动构成的过程,其目的是提供 IT 基础架构的逻辑模型,支持其它服务管理流程特别是变更管理和发布管理的运作。为此,配置管理需要计量所有 IT 资产,为其它流程提供准确的信息,为事件管理、问题管理、变更管理和发布管理提供基础,验证基础架构记录并在必要时纠正有关记录。 发布管理 发布(版本)是指一组经过测试后导入实际运作环境的新增的或经过改动的配置项。发布管理的目的是为了保证发布的成功,主要应用于大型的或关键硬件、主要软件及打包或批处理一组变更。 CMDB: Configuration Management Database 配置管理数据库
服务等级管理 服务等级管理是有关定义、协商、签订和测评提供给客户的服务的质量水准的流程。服务等级协议中说明了有关所提供的服务和这些服务的质量水准,并规定了服务双方各自的责任、权利和义务。服务等级管理是 IT 服务成功运作的重要保障。 就像服务台是服务支持各流程的 “ 联系点 ” 一样,服务等级协议是服务提供各流程的连接处, 它定量说明了 IT 服务过程中的财务、持续性和可用性等方面的数据指标,并规定当这些定量指标没有被满足时的处理方法。同时,服务等级协议还详细说明了处理事件的升级方法。 IT 服务财务管理 IT 服务财务管理是负责预算和核算 IT 服务提供方提供 IT 服务所需的成本,并向客户收取相应服务费用的管理流程。 IT 服务财务管理流程包括 IT 投资预算、 IT 服务成本核算和服务计费三个子流程,其目标是通过量化服务成本减少成本超支的风险、减少不必要的浪费、合理引导客户的行为,从而最终保证所提供的 IT 服务符合成本效益的原则。 IT 服务财务管理流程产生的预算和核算信息可以为服务级别管理、能力管理、 IT 服务持续性管理和变更管理等管理流程提供决策依据。 IT 服务持续性管理 IT 服务持续性管理是指确保发生灾难后有足够的技术、财务和管理资源来确保 IT 服务持续性的管理流程。 IT 服务持续性管理关注的焦点是在发生服务故障后仍然能够提供预定级别的 IT 服务,从而支持组织的业务持续运作的能力。因此, IT 服务持续性管理必须立足于组织的业务持续性管理。 可用性管理 可用性管理是通过分析用户和业务方的可用性需求并据以优化和设计 IT 基础架构的可用性,从而确保以合理的成本满足不断增长的可用性需求的管理流程。 可用性管理是一个前瞻性的管理流程,它通过对业务和用户可用性需求的定位,使得 IT 服务的设计建立在真实需求的基础上,从而避免 IT 服务运作中采用了过度的可用性级别,节约了 IT 服务的运作成本。 能力管理 能力管理是指在成本和业务需求的双重约束下,通过配置合理的服务能力使组织的 IT 资源发挥最大效能的服务管理流程。 能力管理流程包括业务能力管理、服务能力管理和资源能力管理三个子流程,其中业务能力管理子流程主要关注当前及未来的业务需求,服务能力管理子流程主要关注当前 IT 服务的品质是否能够支持正常的业务运作,而资源能力管理子流程主要关注所有服务提供赖以进行的技术基础,确保 IT 基础设施中所有组件能发挥最大的效能。
Service Desk Introduction Many organizations have implemented a Service Desk as a central point of contact for handling customer, user and related issues. The Service Desk is an extension of call centers and help desks, offering a broader range of services and a more globally focused approach, allowing business processes to be integrated into the service management infrastructure. The Service Desk provides a vital day-to-day point of contact between customers, users, IT services and third-party support organizations. For many customers the Service Desk is the most important function in an organization, as, for them, Service Desk provides the only indication of the service levels and professionalism offered by the organization. This means that the Service Desk has great strategic importance, representing the interests of the customer to the service team. The Service Desk thus helps to ensure the delivery of customer satisfaction. The provision of a Service Desk can benefit any organization, independent of the size of their support staff or user base.
Service Desk Introduction Many organizations have implemented a Service Desk as a central point of contact for handling customer, user and related issues. The Service Desk is an extension of call centers and help desks, offering a broader range of services and a more globally focused approach, allowing business processes to be integrated into the service management infrastructure. The Service Desk provides a vital day-to-day point of contact between customers, users, IT services and third-party support organizations. For many customers the Service Desk is the most important function in an organization, as, for them, Service Desk provides the only indication of the service levels and professionalism offered by the organization. This means that the Service Desk has great strategic importance, representing the interests of the customer to the service team. The Service Desk thus helps to ensure the delivery of customer satisfaction. The provision of a Service Desk can benefit any organization, independent of the size of their support staff or user base.
Incident Management The goal of Incident Management ensures that high quality service levels are maintained and that service availability meets the customers’ requirements. Incident Management has a reactive task, as reducing or eliminating the effects of (potential) disturbances in IT services, thus ensuring that users can get back to work as soon as possible. For this reason, incidents are recorded, classified and allocated to appropriate specialists, incident progress is monitored, and incidents are resolved and they are closed.
Problem Management " Problem Management is concerned with the resolution and prevention of incidents; in other words, with putting right that which has gone wrong, preventing recurrence , and with preventing things from going wrong at all." The objective of Problem Management is to minimize the adverse impact of incidents and problems on the business that are caused by errors in the IT infrastructure and to prevent recurrence of incidents related to these errors. Problem management seeks to get to the root cause and initiate action to remove the error. The Problem Management process has both reactive and proactive aspects. The reactive aspect is concerned with solving Problems in response to one or more Incidents. Proactive Problem Management is concerned with identifying and solving Problems and Known errors before Incidents occur in the first place.
Problem Management " Problem Management is concerned with the resolution and prevention of incidents; in other words, with putting right that which has gone wrong, preventing recurrence , and with preventing things from going wrong at all." The objective of Problem Management is to minimize the adverse impact of incidents and problems on the business that are caused by errors in the IT infrastructure and to prevent recurrence of incidents related to these errors. Problem management seeks to get to the root cause and initiate action to remove the error. The Problem Management process has both reactive and proactive aspects. The reactive aspect is concerned with solving Problems in response to one or more Incidents. Proactive Problem Management is concerned with identifying and solving Problems and Known errors before Incidents occur in the first place.
Record error resolution is important for next incident matching
Change Management To make an appropriate response to a Change request entails a considered approach to assessment of risk and business continuity, Change impact, resource requirements and Change approval. This considered approach is essential to maintain a proper balance between the need for Change against the impact of the Change. Change Management is responsible for managing Change processes involving: hardware communications equipment and software system software ” live” application software all documentation and procedures associated with the running, support and maintenance of live systems Change Management should consider: what should be in the plan ownership circulation what key businesses should be supported how those businesses are supported by IT links to business continuity and IT contingency plans the critical components timescales risks regression strategy Invocation CAB members: Change Manager, Customers, User manager, Application developers, Technical consultants, Service staff (as required), Third parties representatives (as required)
Introduction Every IT organization has information about its IT infrastructure. Such information is particularly likely to be available after major projects that are generally followed by an audit and impact analysis. However, the art is in keeping the information up-to-date. Configuration management aims to provide reliable and details about the IT infrastructure. Importantly, these details include not just details on specific items in the infrastructure (Configuration Items, or CIs), but how these CIs relate to one another. These relationships form the basis for impact analysis. The processes of Configuration Management include: Check if changes in the IT infrastructure have been recorded correctly, including the relationships between CIs Monitor the status of the IT components Ensure that there is an accurate picture of the versions of Configuration Items (CIs) in existence CMDB 与 Asset DB 的区别: Asset DB 偏重于有价值的资产、折旧等的管理, CMDB 除了记录资产信息外,还记录一些如变更申请、问题记录、配置信息等详细记录以及他们之间的相互联系。 CI ,可以是硬件、软件、事件、变更申请、服务水准等,可以大到整个系统,小到一个部件, CI 之间的关系有 part of ,connect to, use, copy, parent/child
Release Management Introduction Release Management aims to ensure the quality of the production environment, by using formal procedures and checks when implementing new versions. Release management is concerned with implementation, unlike change management that is concerned verification. Release management works closely with configuration management and change management, to ensure that the common CMDB is updated with every release. Release management also ensures that the contents of release in the Definitive Software Library (DSL) are updated. The CMDB also keeps track of hardware specifications, installation instructions, and network configuration. Stocks of hardware, particularly standardized basic configurations, are stored in the Definitive Hardware Store (DHS). However, in general, release management is primarily concerned with software. In large projects in particular, release management should be part of the overall project plan to ensure funding. An annual fixed budget can be allocated to routine activities such as minor changes. Although costs will be incurred when setting up the process, these are minor compared to the potential costs associated with poor planning and control of software and hardware, such as: Major interruptions due to poorly planned software release Duplication of work because there are copies of different versions Inefficient use of resources because nobody knows where the resources are Loss of source files, which means that software has to be purchased again No virus protection, which means that entire networks need decontamination Release Management should be used for: large or critical hardware rollouts, especially when there is a dependency on a related software Change in the business systems, i.e. not every single PC that needs to be installed. major software rollouts, especially initial instances of new applications along with accompanying software distribution and support procedures for subsequent use if required. bundling or batching related sets of Changes into manageable-sized units.
SLM introduction Service Level Management (SLM) is the name given to the processes of planning, co-ordinating, drafting, agreeing, monitoring and reporting on SLAs, and the on-going review of service achievements to ensure that the required and cost-justifiable service quality is maintained and gradually improved. SLAs provide the basis for managing the relationship between the provider and the Customer. When the first ITIL SLM book was published in 1989, very few organizations had SLAs in place. Today most organizations have introduced them – though with varying degrees of success. This version includes some coverage of the common causes of failure, and guidance on how to overcome these difficulties. SLM is essential in any organization so that the level of IT Service needed to support the business can be determined, and monitoring can be initiated to identify whether the required service levels are being achieved - and if not, why not. Service Level Agreements (SLA), which are managed through the SLM Process, provide specific targets against which the performance of the IT organization can be judged. SLAs should be established for all IT Services being provided. Underpinning contracts and Operational Level Agreements (OLAs) should also be in place with those suppliers (external and internal) upon who the delivery of service is dependent. SLA is an agreement between IT service provider and the IT customer, Content: Service Hour, Availability, Reliability, Support, Resolve time, Service Reporting, etc) OLA is an agreement made between an internal IT department (e.g. Network Management) and Service Level Management UC is a contract with an external supplier
Financial Management Introduction As the number of users grows, IT budget keeps growing. Customers grow more concerned about IT spending, and less able to map this spending to the business. Financial Management is developed to structure the management of the IT infrastructure to promote the efficient and economic use of IT resources. An effective cost control system should fulfill the following criteria: Support the development of an investment strategy that allows for the flexibility provided by modern technology. Identify priorities in the use of resources. Cover the costs of all IT resources used in the organization, including updating relevant information. Support management with day-to-day decisions so that long-term decisions can be taken with the lowest possible financial risk. Be flexible and able to respond quickly to changes in the business activities. Within an IT organization it is visible in three main processes: Budgeting is the process of predicting and controlling the spending of money within the organization and consists of a periodic negotiation cycle to set budgets (usually annual) and the day-to-day monitoring of the current budgets. IT Accounting is the set of processes that enable the IT organization to account fully for the way its money is spent (particularly the ability to identify costs by Customer, by service, by activity). It usually involves ledgers and should be overseen by someone trained in accountancy. Charging is the set of processes required to bill Customers for the services supplied to them. To achieve this requires sound IT Accounting, to a level of detail determined by the requirements of the analysis, billing and reporting processes.
On-going, Ad hoc, Regularly
Introduction A disaster is much more serious than an incident. A disaster is a business interruption. That means that all or part of the business is not “in business” following a disaster. Familiar disasters include fire, lighting, water damage, burglary, vandalism and violence, large-scale power outages, and hardware failure. Some companies could have prevented serious problems by thinking about and developing Business Continuity Plans. Furthermore, business are becoming increasingly dependent on IT services, which means that the impact of the loss of services also increases and becomes less acceptable. It is therefore essential to consider how business continuity can be safeguarded. Traditional contingency planning used to be part of the remit of the IT organization. However, at present It is much more closely integrated with many aspects of the business. Where the traditional contingency planning process was primarily reactive (what to do in the event of a disaster), the new IT Service Continuity Management process emphasizes prevention, i.e. avoiding disasters. Why ITSCM Potential lower insurance premiums Regulatory requirements Business relationship Positive marketing of contingency capabilities Organizational credibility Competitive advantage Scope considerations The dependence on technology, its Infrastructure and any external providers of support services The number and location of the organization’s offices and the services performed in each The number of critical business processes and the level of integration between them The level of services that need to be provided to the business to support those critical business processes Any limitation in the provision of ITSCM mechanisms The organization’s attitude towards risks. An organization's structure, culture and academic direction (both business and technology) are key drivers in determining the scope of ITSCM . Significant benefits can be derived from the involvement of someone, with good specific business and Infrastructure knowledge and experience, who can ensure that these are considered. At the broadest level, the scope of ITSCM is usually defined in terms of the: business processes to be covered and their IT support requirements (e.g. systems, networks, communications, support staff skills, data and documentation etc.) risks that need to be addressed.
Stage 1: set policy, defining the scope and responsibility of managers and staff in the organization. Resource allocation Stage 2: Business Impact Analysis- finding critical business process, identify single point of failure, potential damage or loss (identify risk, asses threat and vulnerability level) , the time within which minimum levels of staffing, facilities and services can be made live after a disruption in service IT recovery option: do nothing, manual work-arounds, reciprocal arrangement, gardual recovery (Cold standby), Intermediate recovery( Warm Standby), Immediate recover (hot standby) Stage3 Implement Risk Reduction Measures: offsite storage, RAID arrays and disk mirror Implement stand-by arrangement
Availability Management Introduction The pace of technological development keeps increasing. Because of this, within many organizations the hardware and software that is needed keeps expanding and is becoming more diverse, despite of standardization efforts. Old and new technologies have to work together. This results in additional network structures, interfaces and communications facilities. Business operations are becoming increasingly dependent on reliable technology. A few hours of computer downtime can have a serious impact on the turnover and image of a business, particularly now that the Internet is developing into an electronic marketplace. As the competitors’ businesses are only a mouse click away, customer loyalty and satisfaction are now more important than ever. This is one of the reasons why computer systems are now commonly expected to be available 7 days a week, 24 hours a day. Guiding Principles: Availability is at the core of business and user satisfaction Recognizing that when things go wrong, it is still possible to achieve business and user satisfaction Improving availability can only begin after understanding how the IT Services support the business Effective Availability Management influences Customer satisfaction and determines the marketplace reputation of the business.