SlideShare a Scribd company logo
1 of 14
Installation and Maintenance of Health
IT Systems
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Lecture a
his material (Comp 8 Unit 9) was developed by Duke University, funded by the Department of Health and Human
Services, Office of the National Coordinator for Health Information Technology under Award Number
IU24OC000024. This material was updated in 2016 by The University of Texas Health Science Center at
Houston under Award Number 90WT0006.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Learning Objectives
1. Define availability, reliability, redundancy, and fault
tolerance (Lecture a)
2. Explain areas and outline rules for implementing fault
tolerant systems (Lecture a)
3. Perform risk assessment (Lecture a)
4. Follow best practice guidelines for common
implementations (Lecture b)
5. Develop strategies for backup and restore of operating
systems, applications, configuration settings, and
databases (Lecture c)
6. Decommission systems and data (Lecture c)
2
Redundancy and Fault Tolerance
• Dependence on EHRs is increasing.
• EHR systems require redundant, or “failover”, resources
and fault tolerance to ensure uptime and data integrity so
that it can perform as specified.
• “Failure” vs “fault”: fault is the cause of a failure of the
system to comply with its specifications or precise
requirements.
• Fault tolerance is resilience in a system, or ability to
continue performing to specification despite problems
• Ask vendor how fault tolerance is designed/coded into
the EHR application.
3
Creating Fault Tolerance
• Redundancy
– Secondary or “backup”
systems
• Reliability
– Infrequent failure
– Redundant components
• Availability
– Accessible when needed –
no downtime
– Available systems are
reliable and accessible
• Computer hardware
– Servers and workstations
• Data storage
– Hard disks
• Network and Power
– Network switches and
Internet access
– Mains, generators,
batteries
• Virtualization
– Isolation of system from
hardware
4
System Failure and Downtime
• Forrester Consulting report on server failure during prior two
years:
– ¾ experienced downtime.
– Only 1% of server outages were resolved within five
minutes.
– 68% had impact on clinical activities.
– 50+% affected administrative processes.
• How much downtime is acceptable?
– Required good understanding of business’ processes
Critical system downtime can have significant negative
impact on patient health
(Forrester Consulting Report, 2010)
5
Three Areas for Fault Tolerance
1. Hardware fault tolerance – compensate for hardware failure
– Often simplest to implement
– Extra hardware resources as secondaries or backups
– E.g., secondary network cards, error checking and correcting (ECC)
memory, redundant power supplies, redundant disks / file storage
2. Software fault tolerance – compensate for poor programming or data
– Involves program verification (code review) and assertion checking
– Compensating for faults such as poorly formatted input data
– E.g., sanity check, double-entry comparison, and multiple-version
programs
3. System fault tolerance – compensate for non-computer or inter-device
failures
– Most complex, highest number of variables
– System may include facilities that are not computer-based
– E.g., detection of sensor failure, graceful reaction to intersystem
communication failure, graceful shutdown in unexpected circumstances
(A Conceptual Framework for System Fault Tolerance - 1.1 What is a System?, 1995)
6
Six Rules of Fault Tolerance
In “A Conceptual Framework for Systems Fault Tolerance”,
the Center For High Integrity Software Systems Assurance
summarizes 6 rules:
1.Know precisely what the system is supposed to do.
2.Look at what can go wrong.
3.Study your application & determine appropriate fault
containment regions & earliest feasible time to deal with
potential faults.
4.Completely understand application requirements & use
them to make appropriate time/space trade-offs.
5.Concentrate on credible faults first.
6.Determine application failure margins.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
7
Six Rules of Fault Tolerance
(cont’d)
• Rule 1: Know precisely what the system is
supposed to do.
– How long can system be allowed to deviate from
specifications before being declared a “failure”?
– What abnormal conditions must be
accommodated?
• Rule 2: Look at what can go wrong.
– Group causes into classes.
– Define “fault floor”.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
8
Six Rules of Fault Tolerance
(cont’d – 2)
• Rule 3: Study your application & determine
appropriate fault containment regions & earliest
feasible time to deal with potential faults.
– Fault tolerance generally means more resources
(time & space)
• Rule 4: Completely understand application
requirements & use them to make appropriate
time/space trade-offs.
– Consider costs, & classify faults by likelihood.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
9
Six Rules of Fault Tolerance
(cont’d – 3)
• Rule 5: Concentrate on credible faults first.
– Ignore less likely faults unless they require little
additional cost. Mitigate the most likely faults first.
• Rule 6: Determine application failure margins.
– Balance the degree of fault tolerance needed with the
cost of implementation.
– Does a small expenditure now save a great deal
later?
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
10
Risk Assessment
Risk Assessment
• Identify what is to be protected
– Examples: EHR server, or clinical record
– Include rating of importance
– Types of loss or liability
• Identify risks to each component
– Examples: Power failure, or record alteration
– Risk = Threat x Probability x Impact
– Intentional or Accidental, Human or System, Internal or External
• Identify mitigation strategies for each risk
– Examples: UPS with power monitoring, or automatic backup
– Policies (for people) or Controls (for systems or equipment)
(Benson, n.d., Maniscalchi. 2009) 11
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Summary – Lecture a
• Fault tolerance is running despite
problems
• Implemented using Redundancy to
increase Reliability and provide Availability
• Three areas of Hardware, Software, and
System
• Risk assessment to identify assets, risks,
and mitigation
12
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
References – Lecture a
References
Benson C. Security Planning. (n.d.) Available from: http://technet.microsoft.com/en-
us/library/cc723503.aspx
Maniscalchi, J. Threat vs. Vulnerability vs. Risk. (June 2009) Available from:
https://www.pinkerton.com/blog/risk-vs-threat-vs-vulnerability-and-why-you-should-
know-the-differences/
Heimerdinger, W. L., Weinstock, C. B., A Conceptual Framework for System Fault
Tolerance (1992, October). Retrieved from:
http://www.sei.cmu.edu/reports/92tr033.pdf
Server Availability Trends In The Time Of Electronic Health Records. (January 2010)
Forrester Research, Inc. Available at
http://www.stratus.com/assets/ServerAvailabilityTrends_EHR_ForresterPaper.pdf
13
Installation and Maintenance of
Health IT Systems
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Lecture a
This material was developed by Duke University,
funded by the Department of Health and Human
Services, Office of the National Coordinator for
Health Information Technology under Award
Number IU24OC000024. This material was
updated in 2016 by The University of Texas Health
Science Center at Houston under Award Number
90WT0006.
14

More Related Content

What's hot

Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)Ian Sommerville
 
Priliminary hazard analysis
Priliminary hazard analysisPriliminary hazard analysis
Priliminary hazard analysisAmansharma1378
 
A survey of online failure prediction methods
A survey of online failure prediction methodsA survey of online failure prediction methods
A survey of online failure prediction methodsunyil96
 
Failure analysis buisness impact-backup-archive
Failure analysis buisness impact-backup-archiveFailure analysis buisness impact-backup-archive
Failure analysis buisness impact-backup-archiveDavin Abraham
 
3.10 Introducing large ict systems into organisations
3.10 Introducing large ict systems into organisations3.10 Introducing large ict systems into organisations
3.10 Introducing large ict systems into organisationsmrmwood
 
software Ch01
software Ch01software Ch01
software Ch01liincn
 
Reliability centred maintenance service types & schematic
Reliability centred maintenance service    types & schematicReliability centred maintenance service    types & schematic
Reliability centred maintenance service types & schematicvasishta bhargava
 
Introduction to real time software systems script
Introduction to real time software systems scriptIntroduction to real time software systems script
Introduction to real time software systems scriptsommerville-videos
 
Candelaria montague final_project
Candelaria montague final_projectCandelaria montague final_project
Candelaria montague final_projectcandymontague
 
L5 Dependability Requirements
L5 Dependability RequirementsL5 Dependability Requirements
L5 Dependability RequirementsIan Sommerville
 
Backup And Recovery
Backup And RecoveryBackup And Recovery
Backup And RecoveryWynthorpe
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanRishu Mehra
 
Ch13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and ProceduresCh13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and ProceduresInformation Technology
 
Application and Systems Development
Application and Systems DevelopmentApplication and Systems Development
Application and Systems Developmentamiable_indian
 
Sam Antley Resume
Sam Antley ResumeSam Antley Resume
Sam Antley ResumeSam Antley
 

What's hot (20)

Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)
 
Priliminary hazard analysis
Priliminary hazard analysisPriliminary hazard analysis
Priliminary hazard analysis
 
Khrystyne_M_Sganga
Khrystyne_M_SgangaKhrystyne_M_Sganga
Khrystyne_M_Sganga
 
A survey of online failure prediction methods
A survey of online failure prediction methodsA survey of online failure prediction methods
A survey of online failure prediction methods
 
Failure analysis buisness impact-backup-archive
Failure analysis buisness impact-backup-archiveFailure analysis buisness impact-backup-archive
Failure analysis buisness impact-backup-archive
 
Ch3
Ch3Ch3
Ch3
 
Ch3
Ch3Ch3
Ch3
 
3.10 Introducing large ict systems into organisations
3.10 Introducing large ict systems into organisations3.10 Introducing large ict systems into organisations
3.10 Introducing large ict systems into organisations
 
software Ch01
software Ch01software Ch01
software Ch01
 
THERP
THERPTHERP
THERP
 
jignesh JD
jignesh JDjignesh JD
jignesh JD
 
Reliability centred maintenance service types & schematic
Reliability centred maintenance service    types & schematicReliability centred maintenance service    types & schematic
Reliability centred maintenance service types & schematic
 
Introduction to real time software systems script
Introduction to real time software systems scriptIntroduction to real time software systems script
Introduction to real time software systems script
 
Candelaria montague final_project
Candelaria montague final_projectCandelaria montague final_project
Candelaria montague final_project
 
L5 Dependability Requirements
L5 Dependability RequirementsL5 Dependability Requirements
L5 Dependability Requirements
 
Backup And Recovery
Backup And RecoveryBackup And Recovery
Backup And Recovery
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery Plan
 
Ch13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and ProceduresCh13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and Procedures
 
Application and Systems Development
Application and Systems DevelopmentApplication and Systems Development
Application and Systems Development
 
Sam Antley Resume
Sam Antley ResumeSam Antley Resume
Sam Antley Resume
 

Similar to Comp8 unit9a lecture_slides

Comp8 unit6b lecture_slides
Comp8 unit6b lecture_slidesComp8 unit6b lecture_slides
Comp8 unit6b lecture_slidesCMDLMS
 
Comp8 unit9b lecture_slides
Comp8 unit9b lecture_slidesComp8 unit9b lecture_slides
Comp8 unit9b lecture_slidesCMDLMS
 
Comp8 unit9c lecture_slides
Comp8 unit9c lecture_slidesComp8 unit9c lecture_slides
Comp8 unit9c lecture_slidesCMDLMS
 
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...Cognizant
 
Comp8 unit8b lecture_slides
Comp8 unit8b lecture_slidesComp8 unit8b lecture_slides
Comp8 unit8b lecture_slidesCMDLMS
 
What is dr and bc 12-2017
What is dr and bc 12-2017What is dr and bc 12-2017
What is dr and bc 12-2017Atef Yassin
 
The Six Stages of Incident Response - Auscert 2016
The Six Stages of Incident Response - Auscert 2016The Six Stages of Incident Response - Auscert 2016
The Six Stages of Incident Response - Auscert 2016Ashley Deuble
 
The Six Stages of Incident Response
The Six Stages of Incident Response The Six Stages of Incident Response
The Six Stages of Incident Response Darren Pauli
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
High dependability of the automated systems
High dependability of the automated systemsHigh dependability of the automated systems
High dependability of the automated systemsAlan Tatourian
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Peter Tröger
 
Planning and Deploying an Effective Vulnerability Management Program
Planning and Deploying an Effective Vulnerability Management ProgramPlanning and Deploying an Effective Vulnerability Management Program
Planning and Deploying an Effective Vulnerability Management ProgramSasha Nunke
 
Understanding Test Environments Management
Understanding Test Environments ManagementUnderstanding Test Environments Management
Understanding Test Environments ManagementEnov8
 
Requirement Engineering for Dependable Systems
Requirement Engineering for Dependable SystemsRequirement Engineering for Dependable Systems
Requirement Engineering for Dependable SystemsKamalika Guha Roy
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Peter Tröger
 
Performance Testing Using VS 2010 - Part 1
Performance Testing Using VS 2010 - Part 1Performance Testing Using VS 2010 - Part 1
Performance Testing Using VS 2010 - Part 1Mohamed Tarek
 

Similar to Comp8 unit9a lecture_slides (20)

Comp8 unit6b lecture_slides
Comp8 unit6b lecture_slidesComp8 unit6b lecture_slides
Comp8 unit6b lecture_slides
 
Comp8 unit9b lecture_slides
Comp8 unit9b lecture_slidesComp8 unit9b lecture_slides
Comp8 unit9b lecture_slides
 
Comp8 unit9c lecture_slides
Comp8 unit9c lecture_slidesComp8 unit9c lecture_slides
Comp8 unit9c lecture_slides
 
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
 
Comp8 unit8b lecture_slides
Comp8 unit8b lecture_slidesComp8 unit8b lecture_slides
Comp8 unit8b lecture_slides
 
What is dr and bc 12-2017
What is dr and bc 12-2017What is dr and bc 12-2017
What is dr and bc 12-2017
 
The Six Stages of Incident Response - Auscert 2016
The Six Stages of Incident Response - Auscert 2016The Six Stages of Incident Response - Auscert 2016
The Six Stages of Incident Response - Auscert 2016
 
The Six Stages of Incident Response
The Six Stages of Incident Response The Six Stages of Incident Response
The Six Stages of Incident Response
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
High dependability of the automated systems
High dependability of the automated systemsHigh dependability of the automated systems
High dependability of the automated systems
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
Planning and Deploying an Effective Vulnerability Management Program
Planning and Deploying an Effective Vulnerability Management ProgramPlanning and Deploying an Effective Vulnerability Management Program
Planning and Deploying an Effective Vulnerability Management Program
 
Understanding Test Environments Management
Understanding Test Environments ManagementUnderstanding Test Environments Management
Understanding Test Environments Management
 
A075434624
A075434624A075434624
A075434624
 
Requirement Engineering for Dependable Systems
Requirement Engineering for Dependable SystemsRequirement Engineering for Dependable Systems
Requirement Engineering for Dependable Systems
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)
 
BIS Ch 4.ppt
BIS Ch 4.pptBIS Ch 4.ppt
BIS Ch 4.ppt
 
Ads7 deflorio
Ads7 deflorioAds7 deflorio
Ads7 deflorio
 
Performance Testing Using VS 2010 - Part 1
Performance Testing Using VS 2010 - Part 1Performance Testing Using VS 2010 - Part 1
Performance Testing Using VS 2010 - Part 1
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
 

More from CMDLMS

Culture of healthcare_ week 1_ lecture_slides
Culture of healthcare_ week 1_ lecture_slidesCulture of healthcare_ week 1_ lecture_slides
Culture of healthcare_ week 1_ lecture_slidesCMDLMS
 
Why bother
Why botherWhy bother
Why botherCMDLMS
 
Ensuring two way communications
Ensuring two way communicationsEnsuring two way communications
Ensuring two way communicationsCMDLMS
 
Human Development
Human DevelopmentHuman Development
Human DevelopmentCMDLMS
 
Lecture 11A
Lecture 11ALecture 11A
Lecture 11ACMDLMS
 
lecture C
lecture Clecture C
lecture CCMDLMS
 
lecture 11B
lecture 11Blecture 11B
lecture 11BCMDLMS
 
lecture 10a
lecture 10alecture 10a
lecture 10aCMDLMS
 
lecture 9 B
lecture 9 Blecture 9 B
lecture 9 BCMDLMS
 
Lecture 9 A
Lecture 9 ALecture 9 A
Lecture 9 ACMDLMS
 
Lecture 9C
Lecture 9CLecture 9C
Lecture 9CCMDLMS
 
Lecture 8B
Lecture 8BLecture 8B
Lecture 8BCMDLMS
 
Lecture 8A
Lecture 8ALecture 8A
Lecture 8ACMDLMS
 
Lecture 7B
Lecture 7BLecture 7B
Lecture 7BCMDLMS
 
Lecture C
Lecture CLecture C
Lecture CCMDLMS
 
lecture 7A
lecture 7Alecture 7A
lecture 7ACMDLMS
 
Lecture 6B
Lecture 6BLecture 6B
Lecture 6BCMDLMS
 
Lecture 6A
Lecture 6ALecture 6A
Lecture 6ACMDLMS
 
Lecture 5B
Lecture 5BLecture 5B
Lecture 5BCMDLMS
 
Lecture 5 A
Lecture 5 A Lecture 5 A
Lecture 5 A CMDLMS
 

More from CMDLMS (20)

Culture of healthcare_ week 1_ lecture_slides
Culture of healthcare_ week 1_ lecture_slidesCulture of healthcare_ week 1_ lecture_slides
Culture of healthcare_ week 1_ lecture_slides
 
Why bother
Why botherWhy bother
Why bother
 
Ensuring two way communications
Ensuring two way communicationsEnsuring two way communications
Ensuring two way communications
 
Human Development
Human DevelopmentHuman Development
Human Development
 
Lecture 11A
Lecture 11ALecture 11A
Lecture 11A
 
lecture C
lecture Clecture C
lecture C
 
lecture 11B
lecture 11Blecture 11B
lecture 11B
 
lecture 10a
lecture 10alecture 10a
lecture 10a
 
lecture 9 B
lecture 9 Blecture 9 B
lecture 9 B
 
Lecture 9 A
Lecture 9 ALecture 9 A
Lecture 9 A
 
Lecture 9C
Lecture 9CLecture 9C
Lecture 9C
 
Lecture 8B
Lecture 8BLecture 8B
Lecture 8B
 
Lecture 8A
Lecture 8ALecture 8A
Lecture 8A
 
Lecture 7B
Lecture 7BLecture 7B
Lecture 7B
 
Lecture C
Lecture CLecture C
Lecture C
 
lecture 7A
lecture 7Alecture 7A
lecture 7A
 
Lecture 6B
Lecture 6BLecture 6B
Lecture 6B
 
Lecture 6A
Lecture 6ALecture 6A
Lecture 6A
 
Lecture 5B
Lecture 5BLecture 5B
Lecture 5B
 
Lecture 5 A
Lecture 5 A Lecture 5 A
Lecture 5 A
 

Recently uploaded

College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...delhimodelshub1
 
hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...
hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...
hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...delhimodelshub1
 
Kukatpally Call Girls Services 9907093804 High Class Babes Here Call Now
Kukatpally Call Girls Services 9907093804 High Class Babes Here Call NowKukatpally Call Girls Services 9907093804 High Class Babes Here Call Now
Kukatpally Call Girls Services 9907093804 High Class Babes Here Call NowHyderabad Call Girls Services
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012Call Girls Service Gurgaon
 
Call Girls Hyderabad Kirti 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Kirti 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Kirti 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Kirti 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service MumbaiCollege Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Call Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any TimeCall Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any Timedelhimodelshub1
 
Basics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxBasics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxAyush Gupta
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...High Profile Call Girls Chandigarh Aarushi
 
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...High Profile Call Girls Chandigarh Aarushi
 
pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...
pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...
pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...Call Girls Noida
 
Russian Escorts Delhi | 9711199171 | all area service available
Russian Escorts Delhi | 9711199171 | all area service availableRussian Escorts Delhi | 9711199171 | all area service available
Russian Escorts Delhi | 9711199171 | all area service availablesandeepkumar69420
 
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in LucknowRussian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknowgragteena
 
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...ggsonu500
 
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...ggsonu500
 
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service GoaRussian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goanarwatsonia7
 
Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...
Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...
Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...delhimodelshub1
 
No Advance 9053900678 Chandigarh Call Girls , Indian Call Girls For Full Ni...
No Advance 9053900678 Chandigarh  Call Girls , Indian Call Girls  For Full Ni...No Advance 9053900678 Chandigarh  Call Girls , Indian Call Girls  For Full Ni...
No Advance 9053900678 Chandigarh Call Girls , Indian Call Girls For Full Ni...Vip call girls In Chandigarh
 

Recently uploaded (20)

College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
 
hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...
hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...
hyderabad call girl.pdfRussian Call Girls in Hyderabad Amrita 9907093804 Inde...
 
Kukatpally Call Girls Services 9907093804 High Class Babes Here Call Now
Kukatpally Call Girls Services 9907093804 High Class Babes Here Call NowKukatpally Call Girls Services 9907093804 High Class Babes Here Call Now
Kukatpally Call Girls Services 9907093804 High Class Babes Here Call Now
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
 
Call Girls Hyderabad Kirti 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Kirti 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Kirti 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Kirti 9907093804 Independent Escort Service Hyderabad
 
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service MumbaiCollege Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
 
Call Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any TimeCall Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any Time
 
Basics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxBasics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptx
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
 
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
 
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service GuwahatiCall Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
 
pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...
pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...
pOOJA sexy Call Girls In Sector 49,9999965857 Young Female Escorts Service In...
 
Russian Escorts Delhi | 9711199171 | all area service available
Russian Escorts Delhi | 9711199171 | all area service availableRussian Escorts Delhi | 9711199171 | all area service available
Russian Escorts Delhi | 9711199171 | all area service available
 
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service LucknowVIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
 
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in LucknowRussian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
 
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
 
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
 
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service GoaRussian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
 
Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...
Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...
Russian Call Girls in Hyderabad Ishita 9907093804 Independent Escort Service ...
 
No Advance 9053900678 Chandigarh Call Girls , Indian Call Girls For Full Ni...
No Advance 9053900678 Chandigarh  Call Girls , Indian Call Girls  For Full Ni...No Advance 9053900678 Chandigarh  Call Girls , Indian Call Girls  For Full Ni...
No Advance 9053900678 Chandigarh Call Girls , Indian Call Girls For Full Ni...
 

Comp8 unit9a lecture_slides

  • 1. Installation and Maintenance of Health IT Systems Creating Fault-Tolerant Systems, Backups, and Decommissioning Lecture a his material (Comp 8 Unit 9) was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000024. This material was updated in 2016 by The University of Texas Health Science Center at Houston under Award Number 90WT0006. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
  • 2. Creating Fault-Tolerant Systems, Backups, and Decommissioning Learning Objectives 1. Define availability, reliability, redundancy, and fault tolerance (Lecture a) 2. Explain areas and outline rules for implementing fault tolerant systems (Lecture a) 3. Perform risk assessment (Lecture a) 4. Follow best practice guidelines for common implementations (Lecture b) 5. Develop strategies for backup and restore of operating systems, applications, configuration settings, and databases (Lecture c) 6. Decommission systems and data (Lecture c) 2
  • 3. Redundancy and Fault Tolerance • Dependence on EHRs is increasing. • EHR systems require redundant, or “failover”, resources and fault tolerance to ensure uptime and data integrity so that it can perform as specified. • “Failure” vs “fault”: fault is the cause of a failure of the system to comply with its specifications or precise requirements. • Fault tolerance is resilience in a system, or ability to continue performing to specification despite problems • Ask vendor how fault tolerance is designed/coded into the EHR application. 3
  • 4. Creating Fault Tolerance • Redundancy – Secondary or “backup” systems • Reliability – Infrequent failure – Redundant components • Availability – Accessible when needed – no downtime – Available systems are reliable and accessible • Computer hardware – Servers and workstations • Data storage – Hard disks • Network and Power – Network switches and Internet access – Mains, generators, batteries • Virtualization – Isolation of system from hardware 4
  • 5. System Failure and Downtime • Forrester Consulting report on server failure during prior two years: – ¾ experienced downtime. – Only 1% of server outages were resolved within five minutes. – 68% had impact on clinical activities. – 50+% affected administrative processes. • How much downtime is acceptable? – Required good understanding of business’ processes Critical system downtime can have significant negative impact on patient health (Forrester Consulting Report, 2010) 5
  • 6. Three Areas for Fault Tolerance 1. Hardware fault tolerance – compensate for hardware failure – Often simplest to implement – Extra hardware resources as secondaries or backups – E.g., secondary network cards, error checking and correcting (ECC) memory, redundant power supplies, redundant disks / file storage 2. Software fault tolerance – compensate for poor programming or data – Involves program verification (code review) and assertion checking – Compensating for faults such as poorly formatted input data – E.g., sanity check, double-entry comparison, and multiple-version programs 3. System fault tolerance – compensate for non-computer or inter-device failures – Most complex, highest number of variables – System may include facilities that are not computer-based – E.g., detection of sensor failure, graceful reaction to intersystem communication failure, graceful shutdown in unexpected circumstances (A Conceptual Framework for System Fault Tolerance - 1.1 What is a System?, 1995) 6
  • 7. Six Rules of Fault Tolerance In “A Conceptual Framework for Systems Fault Tolerance”, the Center For High Integrity Software Systems Assurance summarizes 6 rules: 1.Know precisely what the system is supposed to do. 2.Look at what can go wrong. 3.Study your application & determine appropriate fault containment regions & earliest feasible time to deal with potential faults. 4.Completely understand application requirements & use them to make appropriate time/space trade-offs. 5.Concentrate on credible faults first. 6.Determine application failure margins. (A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995) 7
  • 8. Six Rules of Fault Tolerance (cont’d) • Rule 1: Know precisely what the system is supposed to do. – How long can system be allowed to deviate from specifications before being declared a “failure”? – What abnormal conditions must be accommodated? • Rule 2: Look at what can go wrong. – Group causes into classes. – Define “fault floor”. (A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995) 8
  • 9. Six Rules of Fault Tolerance (cont’d – 2) • Rule 3: Study your application & determine appropriate fault containment regions & earliest feasible time to deal with potential faults. – Fault tolerance generally means more resources (time & space) • Rule 4: Completely understand application requirements & use them to make appropriate time/space trade-offs. – Consider costs, & classify faults by likelihood. (A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995) 9
  • 10. Six Rules of Fault Tolerance (cont’d – 3) • Rule 5: Concentrate on credible faults first. – Ignore less likely faults unless they require little additional cost. Mitigate the most likely faults first. • Rule 6: Determine application failure margins. – Balance the degree of fault tolerance needed with the cost of implementation. – Does a small expenditure now save a great deal later? (A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995) 10
  • 11. Risk Assessment Risk Assessment • Identify what is to be protected – Examples: EHR server, or clinical record – Include rating of importance – Types of loss or liability • Identify risks to each component – Examples: Power failure, or record alteration – Risk = Threat x Probability x Impact – Intentional or Accidental, Human or System, Internal or External • Identify mitigation strategies for each risk – Examples: UPS with power monitoring, or automatic backup – Policies (for people) or Controls (for systems or equipment) (Benson, n.d., Maniscalchi. 2009) 11
  • 12. Creating Fault-Tolerant Systems, Backups, and Decommissioning Summary – Lecture a • Fault tolerance is running despite problems • Implemented using Redundancy to increase Reliability and provide Availability • Three areas of Hardware, Software, and System • Risk assessment to identify assets, risks, and mitigation 12
  • 13. Creating Fault-Tolerant Systems, Backups, and Decommissioning References – Lecture a References Benson C. Security Planning. (n.d.) Available from: http://technet.microsoft.com/en- us/library/cc723503.aspx Maniscalchi, J. Threat vs. Vulnerability vs. Risk. (June 2009) Available from: https://www.pinkerton.com/blog/risk-vs-threat-vs-vulnerability-and-why-you-should- know-the-differences/ Heimerdinger, W. L., Weinstock, C. B., A Conceptual Framework for System Fault Tolerance (1992, October). Retrieved from: http://www.sei.cmu.edu/reports/92tr033.pdf Server Availability Trends In The Time Of Electronic Health Records. (January 2010) Forrester Research, Inc. Available at http://www.stratus.com/assets/ServerAvailabilityTrends_EHR_ForresterPaper.pdf 13
  • 14. Installation and Maintenance of Health IT Systems Creating Fault-Tolerant Systems, Backups, and Decommissioning Lecture a This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000024. This material was updated in 2016 by The University of Texas Health Science Center at Houston under Award Number 90WT0006. 14

Editor's Notes

  1. Welcome to Installation and Maintenance of Health IT Systems, Creating Fault -Tolerant Systems, Backups, and Decommissioning, This is lecture a. This component, Installation and Maintenance of Health IT Systems, covers fundamentals of selection, installation, and maintenance of typical Electronic Health Records (EHR) systems. This unit, Creating Fault-Tolerant Systems, Backups, and Decommissioning, will discuss ensuring availability and resiliency through fault tolerance, data reliability through backup, and secure decommissioning of EHR systems.
  2. The objectives for this unit, Creating Fault-Tolerant Systems, Backups, and Decommissioning are to: Define availability, reliability, redundancy, and fault tolerance Explain areas and outline rules for implementing fault tolerant systems Perform risk assessment Follow best practice guidelines for common implementations Develop strategies for backup and restore of operating systems, applications, configuration settings, and databases and Decommission systems and data As healthcare organizations adopt new technology to improve their efficiency, their dependence on that technology increases exponentially. However, what happens to all of these critical applications if a failure were to occur? What about the integrity of the caregiver’s data in the event of a disaster? In this lecture, we will discuss the importance of fault tolerance and redundancy and look at areas and rules for creating and maintaining a fault-tolerant system.
  3. As I just mentioned, dependence on EHR technology is increasing greatly. This means there is a critical need for EHRs to have redundant, or “failover" resources, and fault tolerance to ensure uptime and data integrity. A system is said to have a failure if the service it provides does not meet expected specifications or requirements. A failure’s cause is called a fault. Fault tolerance is the ability of a system to continue performing to specifications despite problems. So fault tolerance is desirable because it reduces the chance of systems failure.
  4. So let’s focus on a few areas of fault tolerance common to most EHR systems, and identify some best practices for implementation. Redundancy is an excellent general practice. It is ensuring that you have more than one of whatever component is important. Redundant power, for instance, means that if the primary source of electricity is removed, a secondary source is available. Note that redundant systems can be fully- or partially- redundant; having a second power plant available may be unrealistic, but a battery or generator that can run for some hours is not. Reliability is another good characteristic. The system overall should not fail. Note that reliable systems may have individual components that fail, but if redundant components take over, then the overall system remains reliable. Availability means that the system is there and working correctly when you need it. A reliable system needs to be accessible when needed to maintain high availability. Later slides will introduce fault-tolerant technologies in common system areas, including computer servers and workstations, file and data storage systems, and communications and power networks. Virtualization of these systems will also be discussed.
  5. According to a Forrester Consulting report from 2010, three-quarters of respondents experienced downtime related to a server failure during the past two years. “Sixty-eight percent had an impact on clinical activities, and greater than half affected administrative processes. Rarely was there no impact. Recovery times were typically measured in hours, not minutes. Only 1 percent of server outages were resolved within five minutes. Providers’ strategies for swapping servers and manual failovers are not medical grade.“ In the healthcare setting, critical system downtime can equate to a significant potential impact on patient health even possibly death.
  6. Below are three areas in which to apply fault tolerance. Hardware: The term fault tolerance usually means automatic compensation for faults or failure of computing hardware. This is often the simplest kind of redundancy to implement, though not necessarily the cheapest. Using additional components that are deployed in parallel or other redundant fashion allows the system to operate even if one of the pieces fails. Techniques include redundant power supplies for equipment, error checking and correcting (ECC) memory, additional network interfaces with secondary connections, and redundant data storage. Software: A fault-tolerant hardware platform does not guarantee high availability to the system user. The software must still compensate for faults such as poorly formatted input data. Mechanisms such as a sanity check for processed data (is this a non-zero file, is every required field present), double entry comparison, and multiple-version programs are often used at this level. System: System fault tolerance will compensate for failures in other system facilities that are not computer-based. If required subcomponents are not available, system fault tolerance ensures that other parts react properly, so that data or time is not lost. As a simple example, a printer that is out of paper should stop running the print motor or printing surface to save wear and tear. Also, the system should be able to stop gracefully, allowing normal operation to resume without duplication of effort or waste of resources, such as halting a lab test if precursor tests have not yet been performed.
  7. Now let’s discuss six rules for approaching fault tolerance in your system. The Center For High Integrity Software Systems Assurance summarizes 6 rules for ensuring fault tolerance. These are: Know precisely what the system is supposed to do. Look at what can go wrong. Study your application & determine appropriate fault containment regions & earliest feasible time to deal with potential faults. Completely understand application requirements & use them to make appropriate time/space trade-offs. Concentrate on credible faults first. Determine application failure margins.
  8. “Rule 1: Know precisely what the system is supposed to do. Part of this process should be determining how long a system can be allowed to deviate from its specification before the deviation is declared a failure. However, it is not sufficient to know what the system is supposed to do under normal circumstances. It is also necessary to know what abnormal conditions the system must accommodate. It is virtually impossible to enumerate the set of all possible faults that a system might encounter. It is much more manageable to deal with classes of faults.” This rule requires broad knowledge of the business environment, not just the specific system. For instance, certain deadlines may exist for regulatory paperwork that will result in large fines for late submission. “Rule 2: Look at what can go wrong, and try to group the causes into classes for easier manageability. This involves defining a fault floor based on your ability to diagnose and repair faults. The goal of fault tolerance is to prevent faults from propagating to the system boundary, where it becomes observable and, hence, a failure. In general, the further a fault has propagated, the harder it is to deal with. Since fault tolerance is redundancy management, however, it becomes a matter of the degree of redundancy desired. For instance, it is almost certainly cheaper to deal with memory faults by using error-correcting memory (that is, redundant bits in a memory location) than by providing a ‘shadow’ memory. Note, however, that dealing with faults earlier rather than later may go counter to the advice given above regarding dealing with classes of faults rather than individual faults. “ This rule may seem boundless, as any pessimist knows that “things can always get worse.” Grouping into general classes of problems can help. Example classes may be: failure of hardware equipment, failure of communication transmission, incorrect data entry.
  9. “Rule 3: Study your application and determine appropriate fault containment regions and the earliest feasible time to deal with potential faults. In general, the price paid for a fault-tolerant system is additional resources, both in terms of time and space. As with most things, these two can be traded off against each other. In some applications, in flight control, for example, timing is everything, even at the cost of extra processors. In general, the comparison approach to fault detection works best in these situations. In other applications, such as a space probe, weight and power consumption is an overriding issue--arguing for a higher reliance on time redundancy and suggesting the use of acceptance tests.” This rule is the beginning of the tradeoff between different requirements. For instance, an error in recording insurance information could be eliminated by requiring verification with the insurance company before proceeding. The additional time and resources to provide that verification may not be worth it, especially if the problem will be seen and manually corrected during the billing process. In contrast, a problem in recording drug delivery or dosage information could threaten the life of a patient. “Rule 4: Completely understand the requirements of your application and use them to make appropriate time/space trade-offs. Protecting a system from every conceivable fault can exhaust another resource--money. This is true even if a rational set of fault classes is defined. The trade-off here is fault coverage versus the cost of that coverage. In all systems, it is possible to classify faults by the likelihood of occurrence.” This rule continues with the idea of appropriate expenditure. An error that is infrequent and has low impact need not receive the kind of attention reserved for an error that happens often or is extremely costly.
  10. “Rule 5: Whenever possible, concentrate on the credible faults and ignore those less likely to occur unless they can be dealt with at little or no additional cost. Time is an essential element in any digital computer system, even in systems that do not claim to be real-time. It is important to define the minimum period of time a system can fail to provide its defined service before a failure is declared. Unnecessarily short failure margins force the system designer to resort to expensive fault-tolerance mechanisms, such as real-time fault masking.” This rule focuses on the likely faults that may occur. Data on these types of faults may already be available for long-term systems, either formally or anecdotally. Unlikely problems should be addressed, but after addressing the likely ones. “Rule 6: Carefully determine application failure margins and use the information to balance the degree of fault tolerance needed with the cost of implementation.” This rule introduces the idea of costs versus benefits. Cost / benefit analysis looks at weighing costs (one-time, ongoing, and indirect) with benefits (savings, greater revenue, higher satisfaction, better patient outcome). Often used in larger budgetary and project decisions, a less formalized version may help decide margins. For instance, a spare PC monitor in storage may cost $150 in initial direct cost, but save money over time if a room that would otherwise be unusable for patients can be brought back up quickly.
  11. Fault tolerance planning and implementation starts with risk assessment, a three step process. Begin by identification of the assets or processes to be protected. A rating of importance should attend each one, so that the more critical ones may be identified easily. A multi-level categorization is often sufficient; top-to-bottom ranking is usually not necessary. Then for each one identified, identify the risks. Risks are ways in which the asset or process can be slowed, corrupted, destroyed, or otherwise made unavailable. A risk is composed of a threat, the likelihood of the threat, and the potential resulting impact of the threat. Again, these should be rated so that the biggest risks (most likely and highest damage threats) are identified easily. Note that sources of threat may be intentional or not, from inside or outside, and from people or systems. Finally, for each of the identified risks, identify how that risk will be mitigated. Note the term “mitigate” – often a risk cannot be eliminated entirely without unrealistic expenditure, so the focus should be on realistic minimization of liability. Identifying the type of liability can help this ranking process – does this affect the ability to continue doing business? Any direct impact on patient health? A drop in the number of patients that can be seen?
  12. This concludes Lecture a of Creating Fault-Tolerant Systems, Backups, and Decommissioning. In this lecture we defined fault tolerance as a system’s ability to continue performing to specification despite problems. The idea is to minimize downtime and maximize availability through reliability and redundancy. Hardware, Software, and Systems are three areas that should be fault tolerant To make systems more fault-tolerant, use risk assessment to evaluate potential problems, then mitigate those potential problems using best practices.
  13. No Audio. Ten seconds of silence.
  14. No Audio.