Comp8 unit9a lecture_slides

Installation and Maintenance of Health
IT Systems
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Lecture a
his material (Comp 8 Unit 9) was developed by Duke University, funded by the Department of Health and Human
Services, Office of the National Coordinator for Health Information Technology under Award Number
IU24OC000024. This material was updated in 2016 by The University of Texas Health Science Center at
Houston under Award Number 90WT0006.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.

Learning Objectives
1. Define availability, reliability, redundancy, and fault
tolerance (Lecture a)
2. Explain areas and outline rules for implementing fault
tolerant systems (Lecture a)
3. Perform risk assessment (Lecture a)
4. Follow best practice guidelines for common
implementations (Lecture b)
5. Develop strategies for backup and restore of operating
systems, applications, configuration settings, and
databases (Lecture c)
6. Decommission systems and data (Lecture c)
2

Redundancy and Fault Tolerance
• Dependence on EHRs is increasing.
• EHR systems require redundant, or “failover”, resources
and fault tolerance to ensure uptime and data integrity so
that it can perform as specified.
• “Failure” vs “fault”: fault is the cause of a failure of the
system to comply with its specifications or precise
requirements.
• Fault tolerance is resilience in a system, or ability to
continue performing to specification despite problems
• Ask vendor how fault tolerance is designed/coded into
the EHR application.
3

Creating Fault Tolerance
• Redundancy
– Secondary or “backup”
systems
• Reliability
– Infrequent failure
– Redundant components
• Availability
– Accessible when needed –
no downtime
– Available systems are
reliable and accessible
• Computer hardware
– Servers and workstations
• Data storage
– Hard disks
• Network and Power
– Network switches and
Internet access
– Mains, generators,
batteries
• Virtualization
– Isolation of system from
hardware
4

System Failure and Downtime
• Forrester Consulting report on server failure during prior two
years:
– ¾ experienced downtime.
– Only 1% of server outages were resolved within five
minutes.
– 68% had impact on clinical activities.
– 50+% affected administrative processes.
• How much downtime is acceptable?
– Required good understanding of business’ processes
Critical system downtime can have significant negative
impact on patient health
(Forrester Consulting Report, 2010)
5

Three Areas for Fault Tolerance
1. Hardware fault tolerance – compensate for hardware failure
– Often simplest to implement
– Extra hardware resources as secondaries or backups
– E.g., secondary network cards, error checking and correcting (ECC)
memory, redundant power supplies, redundant disks / file storage
2. Software fault tolerance – compensate for poor programming or data
– Involves program verification (code review) and assertion checking
– Compensating for faults such as poorly formatted input data
– E.g., sanity check, double-entry comparison, and multiple-version
programs
3. System fault tolerance – compensate for non-computer or inter-device
failures
– Most complex, highest number of variables
– System may include facilities that are not computer-based
– E.g., detection of sensor failure, graceful reaction to intersystem
communication failure, graceful shutdown in unexpected circumstances
(A Conceptual Framework for System Fault Tolerance - 1.1 What is a System?, 1995)
6

Six Rules of Fault Tolerance
In “A Conceptual Framework for Systems Fault Tolerance”,
the Center For High Integrity Software Systems Assurance
summarizes 6 rules:
1.Know precisely what the system is supposed to do.
2.Look at what can go wrong.
3.Study your application & determine appropriate fault
containment regions & earliest feasible time to deal with
potential faults.
4.Completely understand application requirements & use
them to make appropriate time/space trade-offs.
5.Concentrate on credible faults first.
6.Determine application failure margins.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
7

(cont’d)
• Rule 1: Know precisely what the system is
supposed to do.
– How long can system be allowed to deviate from
specifications before being declared a “failure”?
– What abnormal conditions must be
accommodated?
• Rule 2: Look at what can go wrong.
– Group causes into classes.
– Define “fault floor”.
8

(cont’d – 2)
• Rule 3: Study your application & determine
appropriate fault containment regions & earliest
feasible time to deal with potential faults.
– Fault tolerance generally means more resources
(time & space)
• Rule 4: Completely understand application
requirements & use them to make appropriate
time/space trade-offs.
– Consider costs, & classify faults by likelihood.
9

(cont’d – 3)
• Rule 5: Concentrate on credible faults first.
– Ignore less likely faults unless they require little
additional cost. Mitigate the most likely faults first.
• Rule 6: Determine application failure margins.
– Balance the degree of fault tolerance needed with the
cost of implementation.
– Does a small expenditure now save a great deal
later?
10

Risk Assessment
Risk Assessment
• Identify what is to be protected
– Examples: EHR server, or clinical record
– Include rating of importance
– Types of loss or liability
• Identify risks to each component
– Examples: Power failure, or record alteration
– Risk = Threat x Probability x Impact
– Intentional or Accidental, Human or System, Internal or External
• Identify mitigation strategies for each risk
– Examples: UPS with power monitoring, or automatic backup
– Policies (for people) or Controls (for systems or equipment)
(Benson, n.d., Maniscalchi. 2009) 11

Summary – Lecture a
• Fault tolerance is running despite
problems
• Implemented using Redundancy to
increase Reliability and provide Availability
• Three areas of Hardware, Software, and
System
• Risk assessment to identify assets, risks,
and mitigation
12

References – Lecture a
References
Benson C. Security Planning. (n.d.) Available from: http://technet.microsoft.com/en-
us/library/cc723503.aspx
Maniscalchi, J. Threat vs. Vulnerability vs. Risk. (June 2009) Available from:
https://www.pinkerton.com/blog/risk-vs-threat-vs-vulnerability-and-why-you-should-
know-the-differences/
Heimerdinger, W. L., Weinstock, C. B., A Conceptual Framework for System Fault
Tolerance (1992, October). Retrieved from:
http://www.sei.cmu.edu/reports/92tr033.pdf
Server Availability Trends In The Time Of Electronic Health Records. (January 2010)
Forrester Research, Inc. Available at
http://www.stratus.com/assets/ServerAvailabilityTrends_EHR_ForresterPaper.pdf
13

Installation and Maintenance of
Health IT Systems
Lecture a
This material was developed by Duke University,
funded by the Department of Health and Human
Services, Office of the National Coordinator for
Health Information Technology under Award
Number IU24OC000024. This material was
updated in 2016 by The University of Texas Health
Science Center at Houston under Award Number
90WT0006.
14

Comp8 unit9a lecture_slides

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Comp8 unit9a lecture_slides

Similar to Comp8 unit9a lecture_slides (20)

More from CMDLMS

More from CMDLMS (20)

Recently uploaded

Recently uploaded (20)

Comp8 unit9a lecture_slides

Editor's Notes