This document summarizes a research paper that developed a fault tolerance protocol called DRT-FTIP (Distributed Real Time – Fault Tolerance Integrity Protocol) for distributed real-time systems. The protocol is designed to function in dynamic networks and is coupled with an end-to-end distributed real-time scheduling algorithm (EOE-DRTSA) to increase integrity of scheduling. It has three phases - establishing communication, normal operation monitoring task execution, and error detection and recovery if tasks miss deadlines. The goal is to ensure tasks meet deadlines even in the presence of hardware or software faults.
An Investigation of Fault Tolerance Techniques in Cloud Computingijtsrd
Cloud computing which is created on Internet has the most powerful architecture of computation that provides users with the capabilities of information technology as a service and allows them to have access to these services without having specialized information or controlling the infrastructure. Fault tolerance has. The main advantages of using fault tolerance that has all the necessary techniques to keep active power and reliability in cloud computing include failure recovery, lower costs, and improved performance criteria. In this paper, we will investigation of the different techniques that are used for fault tolerance on cloud computing. Ya Min | Khin Myat Nwe Win | Aye Mya Sandar "An Investigation of Fault Tolerance Techniques in Cloud Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26611.pdfPaper URL: https://www.ijtsrd.com/computer-science/distributed-computing/26611/an-investigation-of-fault-tolerance-techniques-in-cloud-computing/ya-min
An Investigation of Fault Tolerance Techniques in Cloud Computingijtsrd
Cloud computing which is created on Internet has the most powerful architecture of computation that provides users with the capabilities of information technology as a service and allows them to have access to these services without having specialized information or controlling the infrastructure. Fault tolerance has. The main advantages of using fault tolerance that has all the necessary techniques to keep active power and reliability in cloud computing include failure recovery, lower costs, and improved performance criteria. In this paper, we will investigation of the different techniques that are used for fault tolerance on cloud computing. Ya Min | Khin Myat Nwe Win | Aye Mya Sandar "An Investigation of Fault Tolerance Techniques in Cloud Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26611.pdfPaper URL: https://www.ijtsrd.com/computer-science/distributed-computing/26611/an-investigation-of-fault-tolerance-techniques-in-cloud-computing/ya-min
Proactive cloud service assurance framework for fault remediation in cloud en...IJECEIAES
Cloud resiliency is an important issue in successful implementation of cloud computing systems. Handling cloud faults proactively, with a suitable remediation technique having minimum cost is an important requirement for a fault management system. The selection of best applicable remediation technique is a decision making problem and considers parameters such as i) Impact of remediation technique ii) Overhead of remediation technique ii) Severity of fault and iv) Priority of the application. This manuscript proposes an analytical model to measure the effectiveness of a remediation technique for various categories of faults, further it demonstrates the implementation of an efficient fault remediation system using a rulebased expert system. The expert system is designed to compute a utility value for each remediation technique in a novel way and select the best remediation technique from its knowledgebase. A prototype is developed for experimentation purpose and the results shows improved availability with less overhead as compared to a reactive fault management system.
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSijgca
Grid computing or computational grid is always a vast research field in academic, as well as in industry also. Computational grid provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Various heterogeneous resources of different administrative domain are virtually distributed through different network in computational grids. Thus any type of failure can occur at any point of time and job running in grid environment might fail. Hence fault tolerance is an important and challenging issue in grid computing as the dependability of individual grid resources may not be guaranteed. In order to make computational grids more effective and reliable fault tolerant system is necessary. The objective of this paper is to review different existing fault tolerance techniques applicable in grid computing. This paper presents state of the art of various fault tolerance technique and comparative study of the existing algorithms.
Proactive cloud service assurance framework for fault remediation in cloud en...IJECEIAES
Cloud resiliency is an important issue in successful implementation of cloud computing systems. Handling cloud faults proactively, with a suitable remediation technique having minimum cost is an important requirement for a fault management system. The selection of best applicable remediation technique is a decision making problem and considers parameters such as i) Impact of remediation technique ii) Overhead of remediation technique ii) Severity of fault and iv) Priority of the application. This manuscript proposes an analytical model to measure the effectiveness of a remediation technique for various categories of faults, further it demonstrates the implementation of an efficient fault remediation system using a rulebased expert system. The expert system is designed to compute a utility value for each remediation technique in a novel way and select the best remediation technique from its knowledgebase. A prototype is developed for experimentation purpose and the results shows improved availability with less overhead as compared to a reactive fault management system.
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSijgca
Grid computing or computational grid is always a vast research field in academic, as well as in industry also. Computational grid provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Various heterogeneous resources of different administrative domain are virtually distributed through different network in computational grids. Thus any type of failure can occur at any point of time and job running in grid environment might fail. Hence fault tolerance is an important and challenging issue in grid computing as the dependability of individual grid resources may not be guaranteed. In order to make computational grids more effective and reliable fault tolerant system is necessary. The objective of this paper is to review different existing fault tolerance techniques applicable in grid computing. This paper presents state of the art of various fault tolerance technique and comparative study of the existing algorithms.
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
Cloud computing systems need to be built for failure to ensure that they continue operating even if the
cloud system has an error. The errors should be masked from the cloud users to ensure that users continue
accessing the cloud services and this intern leads to cloud consumers gaining confidence in the availability
and reliability of cloud services.
In this paper, we propose the use of N-Modular redundancy to design and implement failure-free clouds
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
The Indo-American Journal of Agricultural and Veterinary Sciences is an online international journal published quarterly. It is a peer-reviewed journal that focuses on disseminating high-quality original research work, reviews, and short communications of the publishable paper.
SIMULATION-BASED APPLICATION SOFTWARE DEVELOPMENT IN TIME-TRIGGERED COMMUNICA...IJSEA
This paper introduces a simulation-based approach for design and test of application software for timetriggered
communication systems. The approach is based on the SIDERA simulation system that supports
the time-triggered real-time protocols TTP and FlexRay. We present a software development platform for
FlexRay based communication systems that provides an implementation of the AUTOSAR standard
interface for communication between host application and FlexRay communication controllers. For
validation, we present an application example in the course of which SIDERA has been deployed for
development and test of software modules for an automotive project in the field of driving dynamics
control.
Introduction to Operating Systems: Function, Evolution, Different Types, Desirable Characteristics and features of an O/S, Operating Systems Services: Types of Services, Different ways of providing these Services – Utility Programs, System Calls.
Steganography methods using network protocolsDr Amira Bibo
The covert channels of type covert storage channels were used in this research to
hide textual data or an (image hiding secrete data) in the transmission protocol layer and
the internet protocol layer of TCP/IP module. We use IP protocol in designing covert
channel by using Identification field, and use TCP protocol in designing covert channel
by using Urgent Pointer field, and UDP by using Source Port field. Finally ICMP
protocol by using Echo request message and Message field.
The result of Covert channel after analyzing the protocols (IP, TCP, UDP,
ICMP) header depends on the field that used in the designing the Covert channel. The
access ratio of hidden data in the two protocols TCP/IP was %100. UDP protocol on the
other hand depends on a mechanism of unsafe communication. ICMP protocol provided
a good transmission, though unreliable, in that the message structure can be unreliable
or clear.
An investigation for steganography using different color systemDr Amira Bibo
ABSTRACT
Steganographic techniques are generally used to maintain the confidentiality of
valuable information and to protect it from any possible theft or unauthorized use
especially over the internet. In this paper, Least Significant Bit LSB-based
Steganographic techniques is used to embed large of data in different color space
models, such as (RGB, HSV, YCbCr, YIQ, YUV). The idea can be summarized by
transforming the RGB value of the secret image pixels into three separate components
into the pixels of the cover image.
The measures (MSE, SNR, PSNR) were used to compare between the color
space models, the comparisons proved that steganography with color systems (RGB
and HIS) shown a best results.
ABSTRACT
Smartphones are used by billions of people that means the applications of the smartphone is increasing, it is out of control for applications marketplaces to completely validate if an application is malicious or legitimate. Therefore, it is up to users to choose for themselves whether an application is safe to use or not. It is important to say that there are differences between mobile devices and PC machines in resource management mechanism, the security solutions for computer malware are not compatible with mobile devices. Consequently, the anti-malware organizations and academic researchers have produced and proposed many security methods and mechanisms in order to recognize and classify the security threat of the Android operating system. By means of the proposed methods are different from one to another, they can be arranged into various classifications. In this review paper, the present Android security threats is discussed and present security proposed solutions and attempt to classify the proposed solutions and evaluate them.
Supervised classification and improved filtering method for shoreline detection.Dr Amira Bibo
ABSTRACT
Shoreline monitoring is important to overcome the problems in the measurement of the shoreline. Recently,
many researchers have directed attention to methods of predicting shoreline changes by the use of
multispectral images. However, the images being captured tend to have several problems due to the weather.
Therefore, identification of multi class features which includes vegetation and shoreline using multispectral
satellite image is one of the challenges encountered in the detection of shoreline. An efficient framework
using the near infrared–histogram equalisation and improved filtering method is proposed to enhance the
detection of the shoreline in Tanjung Piai, Malaysia, by using SPOT-5 images. Sub-pixel edge detection andthe Wallis filter are used to compute the edge location with the subpixel accuracy and reduce the noise. Then,the image undergoes image classification process by using Support Vector Machine. The proposed method performed more effectively and reliable in preserving the missing line of the shoreline edge in the SPOT-5
images.
ABSTRACT
In today’s world, the swift increase of utilizing mobile services and simultaneously discovering of the cloud computing services, made the Mobile Cloud Computing (MCC) selected as a wide spread technology among mobile users. Thus, the MCC incorporates the cloud computing with mobile services for achieving facilities in daily using mobile. The capability of mobile devices is limited of computation context, memory capacity, storage ability, and energy. Thus, relying on cloud computing can handle these troubles in the mobile surroundings. Cloud Computing gives computing easiness and capacity such provides availability of services from anyplace through the Internet without putting resources into new foundation, preparing, or application authorizing. Additionally, Cloud Computing is an approach to expand the limitations or increasing the abilities dynamically. The primary favourable position of Cloud Computing is that clients just use what they require and pay for what they truly utilize. Mobile cloud computing is a form for various services, where a mobile gadget is able to utilize the cloud for data saving, seeking, information mining, and multimedia preparing. Cloud computing innovation is also causes many new complications in side of safety and gets to direct when users store significant information with cloud servers. As the clients never again have physical ownership of the outsourced information, makes the information trustworthiness, security, and authenticity insurance in Cloud Computing is extremely difficult and conceivably troublesome undertaking. In MCC environments, it is hard to find a paper embracing most of the concepts and issues such as: architecture, computational offloading, challenges, security issues, authentications and so on. In this paper we discuss these concepts with presenting a review of the most recent papers in the domain of MCC.
EOE-DRTSA: end-to-end distributed real-time system scheduling algorithmDr Amira Bibo
In this paper, scheduling dependent threads in distributed real-time
system where considered. We present a distributed real-time
scheduling algorithm called (EOE-DRTSA (end-to-end distributed
real time system Scheduling algorithm)). Now a day completed realtime
systems are distributed. One of least developed areas of realtime
scheduling is distributed scheduling where in Distributed
systems action and information timeliness is often end-to-end.
Designers and users of distributed systems often need to dependably
reason about end-to-end timeliness. Our scheduling model includes
threads and their time constraints depend on developed DTUF value
and maintaining end-to-end prosperities of distributed real-time
system.
Constructing sierpinski gasket using gp us arraysDr Amira Bibo
A fractal is a mathematical set that typically displays self-similar
patterns, which means it is "the same from near as from far".
Fractals may be exactly the same at every scale, they may be
nearly the same at different scales. The concept of fractal extends
beyond trivial self-similarity and includes the idea of a detailed
pattern repeating itself. The algorithms to constructing different
fractal shapes in many cases typically involve large amounts of
floating point computation, to which modern GPUs are well
suited. In this paper we will construct Sierpinski Gasket using
GPUs arrays.
ABSTRACT
Shoreline monitoring is important to overcome the problems in the measurement of the shoreline. Recently,
many researchers have directed attention to methods of predicting shoreline changes by the use of
multispectral images. However, the images being captured tend to have several problems due to the weather.
Therefore, identification of multi class features which includes vegetation and shoreline using multispectral
satellite image is one of the challenges encountered in the detection of shoreline. An efficient framework
using the near infrared–histogram equalisation and improved filtering method is proposed to enhance the
detection of the shoreline in Tanjung Piai, Malaysia, by using SPOT-5 images. Sub-pixel edge detection and
the Wallis filter are used to compute the edge location with the subpixel accuracy and reduce the noise. Then,
the image undergoes image classification process by using Support Vector Machine. The proposed method
performed more effectively and reliable in preserving the missing line of the shoreline edge in the SPOT-5
images.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Developing fault tolerance integrity protocol for distributed real time systems
1. Raf. J. of Comp. & Math’s. , Vol. 10, No. 1, 2013
Fifth Scientific Conference Information Technology 2012 Dec. 19-20
187
Developing Fault Tolerance Integrity Protocol for Distributed Real Time Systems
Dhuha B. Abdullah Amira B. Sallow
College of Computer Sciences and Mathematics
University of Mosul
Received on: 21/10/2012 Accepted on: 30/01/2013
ﺍﻟﻤﻠﺨﺹ
ﺍﻟﺤﻘﻴﻘﻲ ﺍﻟﻭﻗﺕ ﺃﻨﻅﻤﺔ ﻓﻲﺍﻟﻤﻭﺯﻋﺔﻜﺎﻥ ﻭﺍﻥ ﺤﺘﻰ ﺍﻟﻤﺤﺩﺩ ﻭﻗﺘﻬﺎ ﻓﻲ ﺘﻨﺠﺯ ﺃﻥ ﺍﻟﻀﺭﻭﺭﻱ ﻤﻥ ﺍﻟﻤﻬﺎﻡ
ﺒﻌﻀ ﻫﻨﺎﻟﻙﹰﺎﺍﻟﺒﺭﻤﺠﻴﺔ ﺃﻭ ﺍﻟﻤﺎﺩﻴﺔ ﺍﻷﺨﻁﺎﺀ ﻤﻥ.ﻗﺩﺭﺓ ﻴﻌﻨﻲ ﺍﻟﻤﻭﺯﻋﺔ ﺍﻟﺤﻘﻴﻘﻲ ﺍﻟﺯﻤﻥ ﺃﻨﻅﻤﺔ ﻓﻲ ﺒﺎﻟﺨﻁﺄ ﺍﻟﺘﺤﻜﻡ
ﻭﺘﺠﺎﻭﺯﻫﺎ ﺍﻷﺨﻁﺎﺀ ﻭﺍﻜﺘﺸﺎﻑ ﻟﻠﻤﻬﺎﻡ ﺍﻟﻨﻬﺎﺌﻴﺔ ﺍﻟﺤﺩﻭﺩ ﺘﺤﻘﻴﻕ ﻋﻠﻰ ﺍﻟﻨﻅﺎﻡ.ﺍﻻﻋﺘﺒﺎﺭ ﺒﻨﻅﺭ ﺍﻷﺨﺫ ﺘﻡ ﺍﻟﺒﺤﺙ ﻫﺫﺍ ﻓﻲ
ﺒ ﺴﻤﻲ ﺒﺎﻟﺨﻁﺄ ﺍﻟﺘﺤﻜﻡ ﺒﺭﻭﺘﻭﻜﻭل ﺘﻁﻭﻴﺭ ﻭﺘﻡ ﺒﺎﻷﺨﻁﺎﺀ ﺍﻟﺘﺤﻜﻡ ﻤﺸﻜﻠﺔـ)ﻷﻨﻅﻤﺔ ﺍﻟﻤﺘﻜﺎﻤل ﺒﺎﻟﺨﻁﺄ ﺍﻟﺘﺤﻜﻡ ﺒﺭﻭﺘﻭﻜﻭل
ﺍﻟﻤﻭﺯﻋﺔ ﺍﻟﺤﻘﻴﻘﻲ ﺍﻟﺯﻤﻥ.(ﻋﻤلﺍﻟﻤﻭﺯﻋﺔ ﺍﻟﺤﻘﻴﻘﻲ ﺍﻟﺯﻤﻥ ﻷﻨﻅﻤﺔ ﺍﻟﺠﺩﻭﻟﺔ ﺘﻜﺎﻤل ﺯﻴﺎﺩﺓ ﻋﻠﻰ ﺍﻟﻤﻘﺘﺭﺡ ﺍﻟﺒﺭﻭﺘﻭﻜﻭل.
ABSTRACT
In the distributed real time systems, tasks must meet their deadline even in the
presence of hardware/software faults. Fault tolerance in distributed real time systems
refers to the ability of the system to meet the tasks deadline and to detect their failure
and recover them. In this paper, we considered the problem of fault tolerance and
developed a fault tolerance protocol called DRT-FTIP (Distributed Real Time – Fault
Tolerance Integrity Protocol).This protocol increases the integrity of the scheduling in
distributed real time systems.
1. Introduction
Real-time systems can be classified as hard real time systems in which the
consequences of missing a deadline can be catastrophic and soft real time systems
in which the consequences are relatively tolerable. In hard real time systems, it
is important that tasks complete within their deadline even in the presence of a
failure. Examples of hard real-time systems are control systems in space
stations, auto pilot systems and monitoring systems for patients with critical
conditions. In soft real-time systems, it is more important to economically detect a
fault as soon as possible rather than to mask a fault. Examples of soft real-time
systems are all kinds of airline reservation, banking, and E-commerce
applications[3].
A faulty system, due to any reason during processing some task, can cause some
damages. A task running on real time distributed system should be feasible, reliable and
scalable. The real time distributed system such as nuclear systems, robotics, air traffic
control systems, grid … etc. are highly dependable on deadline. A fault in real time
distributed system can result a system into failure if not properly detected and recovered
at time. These systems must function with high availability even under hardware and
software faults. Fault-tolerance is the important technique used to maintain
dependability in these systems. Hardware and software redundancy are well-known
effective methods. Hardware fault-tolerance achieved through applying extra hardware
like processors, communication links, resource (memory, I/O device) whereas in
2. Dhuha B. Abdullah & Amira B. Sallow
188
software fault tolerance tasks, messages are added into the system to deal with faults.
Fault should be detected by applying reliable fault detector followed by some recovery
technique. Many fault detection techniques are available but it is necessary to apply
appropriate fault detector. Unreliable fault detector can make mistake by erroneously
suspecting correct process or trusting crashed process[4].
1.1 Contributions
In this paper, we consider the problem of Fault tolerance in distributed real-time
systems. We design a Distributed Real Time – Fault Tolerance Integrity Protocol (DRT-
FTIP) that has the following prosperities:
1. Designed to function in dynamic network.
2. Coupled with EOE-DRTSA (End to End-Distributed Real Time Scheduling
Algorithm.
3. Control integrity and fault tolerant with proposed Distributed real-time system.
1.2 Related Work
Past works on developing fault tolerance integrity protocol by Edward Curley,
Binoy Ravindran, Jonathan Anderson, and E. Douglas Jensen who considered the
problem of recovering from failures of distributable threads in distributed real-time
systems that operate under run-time uncertainties including those on thread execution
times, thread arrivals, and node failure occurrences. They presented thread integrity
protocol called TPR[4].
Binoy Ravindran, Edward Curley, Jonathan Anderson, and E. Douglas Jensen
who considered the problem of recovering from failures of distributable threads in
distributed real-time systems. They presented a scheduling algorithm called HUA and
two thread integrity protocols called D-TPR and W-TPR [5].
2. Failure, Error, and Faults
Avizienis and others define the terms failure, error and faults as fellows [6]:
Failure: A failure system occurs when the service provided by the system deviates from
the specified service. For example, when a user cannot read his/her stored file from
computer memory, then the expected service is not provided by the system.
Error: An error is a perturbation of internal state of the system that may lead to failure.
A failure occurs when the erroneous state causes an incorrect service to be delivered, for
example, when certain portion of the computer
memory is corrupted or broken and stored files, therefore cannot be read by the user.
Fault: The cause of the error is called a fault. An active fault leads to an error;
otherwise the fault is dormant. For example, impurities in the semiconductor devices
may cause computer memory in the long run to behave unpredictably.
3. Techniques for Fault Tolerance
Fault tolerance is the ability to continue operating despite the failure of a
limited subset of their hardware or software. So, the goal of the system designer
is to ensure that the probability of system failure is acceptably small. There can be
either hardware fault or software fault, which disturbs the real time systems to meet
their deadlines. There are three types of faults: Permanent, intermittent, and transient.
A permanent fault does not die away with time, but remains until it is repaired
as the affected unit is replaced. This is an intermittent fault cycle between the fault–
3. Developing Fault Tolerance Integrity Protocol for Distributed Real Time Systems
189
active and fault benign states. A transient fault dies away after some time[4].There are
different types of fault which can occur in Real-Time Distributed System. These faults
can be classified depending on several factors such as:
Network fault: A Fault occurrence in a network is due to network partition, Packet
Loss, Packet corruption, destination failure, link failure, etc.
Physical faults: This Fault can occur in hardware like fault in CPUs, Fault in memory,
Fault in storage, etc. Media faults: Fault occurrence is due to media head crashes.
Processor faults: fault occurs in processor due to operating system crashes.
Process faults: The occurrence of fault is due to shortage of resource, software bugs.
Service expiry fault: The service time of a resource may expire, while application is
using it.
A fault can be categorized on the basis of computing resources and time. A
failure occurs during computation on system resources can be classified as: omission
failure, timing failure, response failure, and crash failure. Faults occur with respect to
time are shown in Figure 1 below:
Figure 1. Types of Fault in a System
Permanent: These failures occur by accidentally cutting a wire, power breakdowns and
so on. It is easy to reproduce these failures. These failures can cause major disruptions
and some part of the system may not be functioning as desired.
Intermittent: These are the failures which appear occasionally. Mostly these failures
are ignored while testing the system and only appear when the system goes into
operation. Therefore, it is hard to predict the extent of damage these failures can bring
to the system.
Transient: These failures are caused by some inherent fault in the system. However,
these failures are corrected by retrying roll back the system to previous state such as
restarting software or resending a message. These failures are very common in
computer systems [4].
4. The System Model
We consider a distributed system architecture model [1] consisting of
heterogeneous processors; a set of client nodes; and a set of server nodes. These nodes
were interconnected via a communication network. A single Global Scheduler for the
4. Dhuha B. Abdullah & Amira B. Sallow
190
system was responsible for computing the initial priorities for the tasks; scheduler utility
was checked based on information provided by the application programmer. As new
tasks were introduced to the system, the Global Scheduler distributes them to client
processors. A Local Scheduler in each processor was responsible for specifying the
order of threads executions on a node. Thus, scheduling decisions made by a node
scheduler are independent of that made by other node schedulers. Client Node
schedulers make scheduling decisions using thread scheduling attributes, which
typically include threads' time constraints (e.g., importance, urgency). When a new job
arrives at client node and its utility is larger than the utility of the currently executing
job, the currently job will be preempted and the new job will be scheduled for
execution.
EOE-DRTSA algorithm [1] was designed to overcome the shortcomings of
independent scheduling algorithms by taking into account global information, while
constructing schedule. In EOE-DRTSA scheduling, firstly the thread will be arrived to
the Global Scheduler server. The Global Scheduler will compute the initial Urgency and
DTUF value. We consider the DTUF (Developed TUF) function to be three
dimensional function that decouples importance and urgency of a thread, urgency is
measured on the X-axis, and importance is measured on the Y-axis. The Benefit is
denoted by DTUF and is measured on the Z-axis. After the DTUF values were
computed for the threads, the Global Scheduler will pass the threads to the target node
to be executed there and the DRT-FTIP protocol will be worked concurrently with the
EOE-DRTSA scheduling algorithm. When the thread arrived to the Local Scheduler
node, the node will schedule the thread depending on its local scheduling algorithm.
DRT-FTIP integrity protocol will monitor the scheduling work on the local scheduler.
5. Protocol Algorithm Description
We design an DRT-FTIP(Distributed Real Time – Fault Tolerance Integrity
Protocol) with EOE-DRTSA scheduling algorithm for distributed real time system.
This protocol will control the faults that occurred in this proposed system. In server
side, when a thread sent to the client the server will expect to receive (thread execute
message) from the client. This message means that the thread is executed successfully
and there is no error. The DRT-FTIP protocol has three operation phases:
Phase I:
Establish the protocol for the first time, the server will send Client_Hello
message and wait until the client responds with the message Server_Hello. After that,
the second phase will begin.
Phase II:
In this phase, there are two states of behavior:
a) Normal state behavior: This behavior will describe the successful work for the
DRT-FTIP protocol without any error. In server side, for each thread that has been
sent to the client, the server will send thread Execute_wait message and commit
wait state in the state field in the Execute_Table that belongs to the sender thread.
When the Execute_finish message arrived from the client the server changes the
state field in the Execute_Table to finished to point that the thread was executed
successfully. Figure(2) shows the normal behavior of DRT-FTIP integrity
protocol.
5. Developing Fault Tolerance Integrity Protocol for Distributed Real Time Systems
191
Figure 2. DRT-FTIP Integrity Protocol: Normal State Behavior
b) Anomaly state behavior: This behavior describes unusual work of
DRT-FTIP protocol. The server check the Execute_Table every Tp (time period).
If an error will be occurred in client side the client system will send exception to
the client application. Actually, exceptions were used for handling unusual errors
in programs that will be occurred during the program execution. The exception
handling mechanism was used to provide a means to detect and report exceptional
circumstances so that, an appropriate action will be taken. The mechanism
suggests incorporation of separate error handling code that performs the following
tasks:
a. Find the problem
b. Inform that an error has been occurred
c. Receive the error information
For any exceptions occurred, the client exception handling will send
error_message to the server explaining the error type. If the error was caused by a
thread, the server will try to send the thread and Thread Execute_wait message
again for three times until Execute_finish message will be arrived. If the error was
6. Dhuha B. Abdullah & Amira B. Sallow
192
caused by the client machine, the server may eliminate that thread from the system
or send it to another client and begins the same operations again. Figure(3) shows
the anomaly behavior for DRT-FTIP integrity protocol.
Phase III:
If there are no threads in the ready queue in server side or the server wants to
finish the protocol in a client, the server will send Client_finish messag and will recive
Server_finish message from the client.
Figure 3. DRT-FTIP Integrity Protocol: Anomaly State Behavior
6. Conclusion
In this paper, we developed a fault tolerance integrity protocol for distributed
real-Time systems. The advantage of the algorithm is that the threads considered
in this system are dynamic and aperiodic. The proposed protocol was simple and easy
to implement. The proposed protocol increases the integrity of the scheduling
algorithm used with it.
7. Developing Fault Tolerance Integrity Protocol for Distributed Real Time Systems
193
REFERENCES
[1] Amira Bibo, Dhuha Basheer, 2012, “EOE-DRTSA: End-to-End Distributed
Real-time System Scheduling Algorithm”, Dept. of Computer Science College
of Computer Sciences and Mathematics University of Mosul, under publishing.
[2] Arvind Kumar, Rama Shankar Yadav, Ranvijay, Anjali Jain, 2011, “Fault
Tolerance in Real Time Distributed System”, Department of Computer Science
and Engineering Motilal Nehru National Institute of Technology, Allahabad.
[3] Christy Persya, T.R.Gopalakrishnan Nair, 2008, “Fault Tolerant Real Time
Systems”, Department of Information Science and Engineering, The Oxford
College of Engineering, Bangalore, India.
[4] Edward Curley, Binoy Ravindran, Jonathan Anderson, E. Douglas Jensen, 2007,
”Recovering from Distributable Thread Failures in Distributed Real-Time Java”,
Virginia Tech Blacksburg, USA.
[5] Edward Curley, Binoy Ravindran, Jonathan Anderson, E. Douglas Jensen, 2006,
“Integrity Protocols for Recovering from Distributable Real-Time Thread
Failures with Assured Timeliness in Dynamic Systems”, Virginia Tech
Blacksburg, USA.
[6] Risat Mahmudpathan, 2010, ”Scheduling Algorithms For Fault-Tolerant Real-
Time Systems”, Department of Computer Science and Engineering, Chalmers
University of Technology, Goteborg, Sweden.