The checkpointing algorithm described in the document takes a distributed approach where processes coordinate to take consistent checkpoints. Each process tracks the number of messages sent to and received from other processes during a checkpoint interval. When a checkpoint is initiated, this status information is shared. Processes only checkpoint after verifying that the number of sent and received messages match for all other processes, ensuring consistency. By tracking dependencies, only processes dependent on a failed process need rollback for recovery. This minimizes unnecessary rollbacks compared to naive approaches.
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...CSCJournals
Mobile system typically uses wireless communication which is based on electromagnetic waves and utilizes a shared broadcast medium. This has made possible creating a mobile distributed computing environment and has brought us several new challenges in distributed protocol design. So many issues such as range of transmission, limited power supply due to battery capacity and mobility of processes. These new issue makes traditional recovery algorithm unsuitable. In this paper, we propose hierarchical non blocking coordinated checkpointing algorithms suitable for mobile distributed computing. The algorithm is non-blocking, requires minimum message logging, has minimum stable storage requirement and produce a consistent set of checkpoints. This algorithm requires minimum number of processes to take checkpoint.
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Eswar Publications
Fault Tolerance Techniques facilitate systems to carry out tasks in the incidence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory; a global state of the system is defined as a combination of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execution of a program to be resumed from a previous consistent global state rather than resuming the execution from the commencement. In this way, the sum of constructive processing vanished because of the fault is appreciably reduced. In this paper, we talk about various issues related to the checkpointing for distributed systems and mobile computing environments. We also confer
various types of checkpointing: coordinated checkpointing, asynchronous checkpointing, communication induced
checkpointing and message logging based checkpointing. We also present a survey of some checkpointing algorithms for distributed systems.
Cooperating processes work together to complete tasks by sharing resources like CPU, memory, and I/O devices. They communicate using shared memory or message passing. Shared memory allows faster communication by sharing an address space, while message passing is slower but can be used across devices. The critical section problem occurs when multiple processes access shared resources simultaneously, potentially corrupting data. Solutions ensure mutual exclusion so only one process is in the critical section at a time through techniques like disabling interrupts or using lock variables.
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...Neelamani Samal
This document summarizes a fault tolerant token-based atomic broadcast algorithm that relies on an unreliable failure detector and satisfies the responsive property. The algorithm aims to tolerate processor-level failures in a distributed system. It divides a job into tasks, uses a token to control access to shared resources, and monitors task execution times. If a task does not respond within the timeout period, it is declared faulty and removed from the ready queue. The algorithm was implemented on a multi-core processor to simulate fault tolerance capabilities in a distributed system within a specified time interval.
This document discusses several key concepts in distributed systems including event ordering, mutual exclusion, concurrency control, deadlock handling, and election algorithms. It provides details on implementing happened-before relations to ensure event ordering. It describes centralized and distributed approaches for mutual exclusion and discusses two-phase commit and locking protocols for concurrency control. It also covers deadlock prevention techniques like wait-die and would-wait schemes and algorithms for distributed deadlock detection and coordinator election.
This document discusses several key concepts in distributed systems including event ordering, mutual exclusion, concurrency control, deadlock handling, and election algorithms. It provides details on implementing happened-before relations to ensure event ordering. It describes centralized and distributed approaches for mutual exclusion and discusses two-phase commit and locking protocols for concurrency control. It also covers deadlock prevention techniques like timestamp ordering and various distributed deadlock detection algorithms. Finally, it summarizes bully and ring algorithms for electing a new coordinator when failures occur.
This document discusses several key concepts in distributed systems including event ordering, mutual exclusion, concurrency control, deadlock handling, and election algorithms. It provides details on implementing happened-before relations to ensure event ordering. It describes centralized and distributed approaches for mutual exclusion and discusses two-phase commit and locking protocols for concurrency control. It also covers deadlock prevention techniques like timestamp ordering and various distributed deadlock detection algorithms. Finally, it summarizes bully and ring algorithms for electing a new coordinator when failures occur.
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...CSCJournals
Mobile system typically uses wireless communication which is based on electromagnetic waves and utilizes a shared broadcast medium. This has made possible creating a mobile distributed computing environment and has brought us several new challenges in distributed protocol design. So many issues such as range of transmission, limited power supply due to battery capacity and mobility of processes. These new issue makes traditional recovery algorithm unsuitable. In this paper, we propose hierarchical non blocking coordinated checkpointing algorithms suitable for mobile distributed computing. The algorithm is non-blocking, requires minimum message logging, has minimum stable storage requirement and produce a consistent set of checkpoints. This algorithm requires minimum number of processes to take checkpoint.
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Eswar Publications
Fault Tolerance Techniques facilitate systems to carry out tasks in the incidence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory; a global state of the system is defined as a combination of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execution of a program to be resumed from a previous consistent global state rather than resuming the execution from the commencement. In this way, the sum of constructive processing vanished because of the fault is appreciably reduced. In this paper, we talk about various issues related to the checkpointing for distributed systems and mobile computing environments. We also confer
various types of checkpointing: coordinated checkpointing, asynchronous checkpointing, communication induced
checkpointing and message logging based checkpointing. We also present a survey of some checkpointing algorithms for distributed systems.
Cooperating processes work together to complete tasks by sharing resources like CPU, memory, and I/O devices. They communicate using shared memory or message passing. Shared memory allows faster communication by sharing an address space, while message passing is slower but can be used across devices. The critical section problem occurs when multiple processes access shared resources simultaneously, potentially corrupting data. Solutions ensure mutual exclusion so only one process is in the critical section at a time through techniques like disabling interrupts or using lock variables.
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...Neelamani Samal
This document summarizes a fault tolerant token-based atomic broadcast algorithm that relies on an unreliable failure detector and satisfies the responsive property. The algorithm aims to tolerate processor-level failures in a distributed system. It divides a job into tasks, uses a token to control access to shared resources, and monitors task execution times. If a task does not respond within the timeout period, it is declared faulty and removed from the ready queue. The algorithm was implemented on a multi-core processor to simulate fault tolerance capabilities in a distributed system within a specified time interval.
This document discusses several key concepts in distributed systems including event ordering, mutual exclusion, concurrency control, deadlock handling, and election algorithms. It provides details on implementing happened-before relations to ensure event ordering. It describes centralized and distributed approaches for mutual exclusion and discusses two-phase commit and locking protocols for concurrency control. It also covers deadlock prevention techniques like wait-die and would-wait schemes and algorithms for distributed deadlock detection and coordinator election.
This document discusses several key concepts in distributed systems including event ordering, mutual exclusion, concurrency control, deadlock handling, and election algorithms. It provides details on implementing happened-before relations to ensure event ordering. It describes centralized and distributed approaches for mutual exclusion and discusses two-phase commit and locking protocols for concurrency control. It also covers deadlock prevention techniques like timestamp ordering and various distributed deadlock detection algorithms. Finally, it summarizes bully and ring algorithms for electing a new coordinator when failures occur.
This document discusses several key concepts in distributed systems including event ordering, mutual exclusion, concurrency control, deadlock handling, and election algorithms. It provides details on implementing happened-before relations to ensure event ordering. It describes centralized and distributed approaches for mutual exclusion and discusses two-phase commit and locking protocols for concurrency control. It also covers deadlock prevention techniques like timestamp ordering and various distributed deadlock detection algorithms. Finally, it summarizes bully and ring algorithms for electing a new coordinator when failures occur.
Communication And Synchronization In Distributed Systemsguest61205606
This document discusses various topics related to communication and synchronization in distributed systems. It covers concepts like communication protocols, remote procedure calls, client-server and peer-to-peer models, blocking vs non-blocking communication, reliability, group communication, message ordering, and synchronization techniques including clock synchronization algorithms, mutual exclusion algorithms, and atomic transactions.
This document discusses various topics related to communication and synchronization in distributed systems. It covers concepts like communication protocols, remote procedure calls, client-server and peer-to-peer models, blocking vs non-blocking communication, reliability, group communication, message ordering, and synchronization techniques including clock synchronization algorithms, mutual exclusion algorithms, and atomic transactions.
Communication And Synchronization In Distributed Systemsguest61205606
This document discusses various topics related to communication and synchronization in distributed systems. It covers concepts like communication protocols, remote procedure calls, client-server and peer-to-peer models, blocking vs non-blocking communication, reliability, group communication, message ordering, and synchronization techniques including clock synchronization algorithms, mutual exclusion algorithms, and atomic transactions.
This document presents a two-phase algorithm to establish consistent checkpoints for recovery in a multi-process distributed system. In phase one, a coordinator process assigns tasks to cohort processes and divides its weight evenly among them. When a cohort completes its task, it returns its weight to the coordinator and takes a tentative checkpoint. Once the coordinator's weight reaches its initial value, no messages are in transit and a consistent global state is reached. In phase two, the coordinator tells all processes to finalize their checkpoints for recovery from failures. The algorithm aims to minimize the number of additional processes that must rollback when one process fails.
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault
tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary
failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved.
Checkpointing is one of the fault-tolerant techniques to restore faults and to restart job fast. The algorithms
for checkpointing on distributed systems have been under study for years.
It is known that checkpointing and rollback recovery are widely used techniques that allow a distributed
computing to progress inspite of a failure.There are two fundamental approaches for checkpointing and
recovery.One is asynchronus approach, process take their checkpoints independenty.So,taking checkpoints
is very simple but due to absence of a recent consistent global checkpoint which may cause a rollback of
computation.Synchronus checkpointing approach assumes that a single process other than the application
process invokes the checkpointing algorithm periodically to determine a consistent global checkpoint.
Distributed deadlock occurs when processes are blocked while waiting for resources held by other processes in a distributed system without a central coordinator. There are four conditions for deadlock: mutual exclusion, hold and wait, non-preemption, and circular wait. Deadlock can be addressed by ignoring it, detecting and resolving occurrences, preventing conditions through constraints, or avoiding it through careful resource allocation. Detection methods include centralized coordination of resource graphs or distributed probe messages to identify resource waiting cycles. Prevention strategies impose timestamp or age-based priority to resource requests to eliminate cycles.
A comparative analysis of minimum process coordinated checkpointing algorithm...IAEME Publication
This document summarizes and compares several minimum-process coordinated checkpointing algorithms for mobile distributed systems. It begins with background on checkpointing techniques in distributed systems and issues specific to mobile distributed systems. It then describes five minimum-process coordinated checkpointing protocols: the Cao and Singhal algorithm, the Kumar and Kumar algorithm, the Silva and Silva algorithm, Koo-Toueg's algorithm, and the Coa and Singhal blocking algorithm. For each algorithm, it provides a brief overview of the approach and key aspects. The document aims to provide a comparative analysis of existing minimum-process checkpointing protocols for mobile distributed systems.
A comparative analysis of minimum process coordinated checkpointingiaemedu
This document summarizes and compares several minimum-process coordinated checkpointing algorithms for mobile distributed systems. It begins with background on checkpointing techniques in distributed systems and issues specific to mobile distributed systems. It then describes five minimum-process coordinated checkpointing protocols: the Cao and Singhal algorithm, the Kumar and Kumar algorithm, the Silva and Silva algorithm, Koo-Toueg's algorithm, and the Coa and Singhal blocking algorithm. For each algorithm, it provides a brief overview of the approach and key aspects. The document aims to provide a comparative analysis of existing minimum-process checkpointing protocols for mobile distributed systems.
A comparative analysis of minimum process coordinatediaemedu
This document summarizes and compares several minimum-process coordinated checkpointing algorithms for mobile distributed systems. It begins with background on checkpointing techniques in distributed systems and issues specific to mobile distributed systems. It then describes five minimum-process coordinated checkpointing protocols: the Cao and Singhal algorithm, the Kumar and Kumar algorithm, the Silva and Silva algorithm, Koo-Toueg's algorithm, and the Coa and Singhal blocking algorithm. For each algorithm, it provides a brief overview of the approach and key aspects. The document aims to provide a comparative analysis of existing minimum-process checkpointing protocols for mobile distributed systems.
A comparative analysis of minimum process coordinatediaemedu
This document summarizes checkpointing algorithms for mobile distributed systems. It discusses minimum-process coordinated checkpointing as a preferred approach. It describes asynchronous and synchronous checkpointing, and categorizes the latter into minimum-process and all-process algorithms. Minimum-process algorithms only require interacting processes to checkpoint, while all-process checkpoints all processes. Blocking and non-blocking variants are also discussed. The document focuses on challenges of mobility for traditional distributed system checkpointing algorithms.
PERFORMANCE ENHANCEMENT WITH SPECULATIVE-TRACE CAPPING AT DIFFERENT PIPELINE ...caijjournal
Simultaneous Multi-Threading (SMT) processors improve system performance by allowing concurrent execution of multiple independent threads with sharing key datapath components and better utilization of resources. Speculative execution allows modern processors to fetch continuously and reduce the delays of control instructions. However, a significant amount of resources is usually wasted due to miss-speculation, which could have been used by other valid instructions, and such a waste is even more pronounced in an SMT system. In order to minimize the waste of resources, a speculative trace capping technique [1] was proposed to limit the number of speculative instructions in the system. In this paper, a thorough analysis is given to investigate the trade-offs among applying this capping mechanism at different pipeline stages so as to maximize its benefits. Our simulations show that the best choice can improve overall system throughput by a very significant margin (up to 46%) without sacrificing execution fairness among the threads.
Computer Applications: An International Journal (CAIJ)caijjournal
Computer Applications: An International Journal (CAIJ) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science Applications. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on Computer science application advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of Computer Science Applications.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of Computer Science Applications.
PERFORMANCE ENHANCEMENT WITH SPECULATIVE-TRACE CAPPING AT DIFFERENT PIPELINE ...caijjournal
Simultaneous Multi-Threading (SMT) processors improve system performance by allowing concurrent execution of multiple independent threads with sharing key datapath omponents and better utilization of resources. Speculative execution allows modern processors to fetch continuously and reduce the delays of control instructions. However, a significant amount of resources is usually wasted due to miss- peculation,
which could have been used by other valid instructions, and such a waste is even more pronounced in an SMT system. In order to minimize the waste of resources, a speculative trace capping technique [1] was proposed to limit the number of speculative instructions in the system. In this paper, a thorough analysis is given to investigate the trade-offs among applying this capping mechanism at different pipeline stages so as
to maximize its benefits. Our simulations show that the best choice can improve overall system throughput
by a very significant margin (up to 46%) without sacrificing execution fairness among the threads.
This document discusses deadlock detection in distributed systems. It begins with defining deadlock and providing an example of a deadlock situation. It then explains that deadlock detection is more challenging in distributed systems due to factors like message loss and lack of shared memory. The document outlines three strategies for deadlock handling - detection and recovery, prevention, and avoidance. It proposes two approaches for deadlock detection in distributed systems: 1) using a central coordinator to merge wait-for graphs or 2) having all machines broadcast their wait-for graphs to detect deadlocks in a distributed way. Both approaches have drawbacks like single point of failure or overhead.
Fault recovery tactics in software systems include:
- Voting with redundant components to detect and correct faults. Diversity uses different software/hardware to detect algorithm faults.
- Active redundancy keeps redundant components synchronized in real-time, allowing recovery in milliseconds by switching to backups.
- Passive redundancy uses a primary component with backup components that are periodically resynchronized, allowing recovery in seconds.
- Using spare components requires reconfiguring software and state on the spare, increasing recovery time to minutes.
- Other tactics for recovering failed components include running them in shadow mode, resynchronizing their state, and rolling back to checkpoints.
This chapter discusses fault tolerance in distributed systems. It covers process resilience through replicating processes into groups that can tolerate failures. It also discusses reliable multicasting to synchronize processes and distributed commit protocols for ensuring atomicity. The chapter objectives are to discuss these topics as well as failure recovery through state saving. It describes different types of faults, failure modes, and how redundancy can be used to mask faults.
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughout academia organizations, business, home, and government sectors. Peer-to-peer (P2P) technology is a typical distributed system model that is gaining popularity for delivering computing resources and services. Distributed systems try to increase its availability in the event of frequent component failures and functioning the system in such scenario is notoriously difficult. In order to identify component failures in the system and achieve global agreement (consensus) among failed components, this paper implemented an efficient failure detection and consensus algorithm based on fail-stop type process failures. The proposed algorithm is fault-tolerant to process failures occurring before and during the execution of the algorithm. The proposed algorithm works with the epidemic gossip protocol, which is a randomly generated paradigm of computation and communication that is both fault-tolerant and scalable. A simulation of an extreme-scale information dissemination process shows that global agreement can be achieved. A P2P simulator, PeerSim, is used in the paper to implement and test the proposed algorithm. The proposed algorithm results exhibited high scalability and at the same time detected all the process failures. The status of all the processes is maintained in a Boolean matrix.
A New Function-based Framework for Classification and Evaluation of Mutual Ex...CSCJournals
This paper presents a new function-based framework for mutual exclusion algorithms in distributed systems. In the traditional classification mutual exclusion algorithms were divided in to two groups: Token-based and Permission-based. Recently, some new algorithms are proposed in order to increase fault tolerance, minimize message complexity and decrease synchronization delay. Although the studies in this field up to now can compare and evaluate the algorithms, this paper takes a step further and proposes a new function-based framework as a brief introduction to the algorithms in the four groups as follows: Token-based, Permission-based, Hybrid and K-mutual exclusion. In addition, because of being dispersal and obscure performance criteria, introduces four parameters which can be used to compare various distributed mutual exclusion algorithms such as message complexity, synchronization delay, decision theory and nodes configuration. Hope the proposed framework provides a suitable context for technical and clear evaluation of existing and future methods.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of user's approach to providing feedback for smart home devices. We will conduct an online survey for a sample of three to five students selected by simple random sampling to study the user's motto for giving feedback on smart home devices and their expectations. We have observed that most users are ready to actively share their input on smart home devices to improve the product's service and quality to fulfill the user’s needs and make their lives easier.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more
straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame
works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of
user's approach to providing feedback for smart home devices. We will conduct an online survey for a
sample of three to five students selected by simple random sampling to study the user's motto for giving
feedback on smart home devices and their expectations. We have observed that most users are ready to
actively share their input on smart home devices to improve the product's service and quality to fulfill the
user’s needs and make their lives easier.
More Related Content
Similar to CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
Communication And Synchronization In Distributed Systemsguest61205606
This document discusses various topics related to communication and synchronization in distributed systems. It covers concepts like communication protocols, remote procedure calls, client-server and peer-to-peer models, blocking vs non-blocking communication, reliability, group communication, message ordering, and synchronization techniques including clock synchronization algorithms, mutual exclusion algorithms, and atomic transactions.
This document discusses various topics related to communication and synchronization in distributed systems. It covers concepts like communication protocols, remote procedure calls, client-server and peer-to-peer models, blocking vs non-blocking communication, reliability, group communication, message ordering, and synchronization techniques including clock synchronization algorithms, mutual exclusion algorithms, and atomic transactions.
Communication And Synchronization In Distributed Systemsguest61205606
This document discusses various topics related to communication and synchronization in distributed systems. It covers concepts like communication protocols, remote procedure calls, client-server and peer-to-peer models, blocking vs non-blocking communication, reliability, group communication, message ordering, and synchronization techniques including clock synchronization algorithms, mutual exclusion algorithms, and atomic transactions.
This document presents a two-phase algorithm to establish consistent checkpoints for recovery in a multi-process distributed system. In phase one, a coordinator process assigns tasks to cohort processes and divides its weight evenly among them. When a cohort completes its task, it returns its weight to the coordinator and takes a tentative checkpoint. Once the coordinator's weight reaches its initial value, no messages are in transit and a consistent global state is reached. In phase two, the coordinator tells all processes to finalize their checkpoints for recovery from failures. The algorithm aims to minimize the number of additional processes that must rollback when one process fails.
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault
tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary
failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved.
Checkpointing is one of the fault-tolerant techniques to restore faults and to restart job fast. The algorithms
for checkpointing on distributed systems have been under study for years.
It is known that checkpointing and rollback recovery are widely used techniques that allow a distributed
computing to progress inspite of a failure.There are two fundamental approaches for checkpointing and
recovery.One is asynchronus approach, process take their checkpoints independenty.So,taking checkpoints
is very simple but due to absence of a recent consistent global checkpoint which may cause a rollback of
computation.Synchronus checkpointing approach assumes that a single process other than the application
process invokes the checkpointing algorithm periodically to determine a consistent global checkpoint.
Distributed deadlock occurs when processes are blocked while waiting for resources held by other processes in a distributed system without a central coordinator. There are four conditions for deadlock: mutual exclusion, hold and wait, non-preemption, and circular wait. Deadlock can be addressed by ignoring it, detecting and resolving occurrences, preventing conditions through constraints, or avoiding it through careful resource allocation. Detection methods include centralized coordination of resource graphs or distributed probe messages to identify resource waiting cycles. Prevention strategies impose timestamp or age-based priority to resource requests to eliminate cycles.
A comparative analysis of minimum process coordinated checkpointing algorithm...IAEME Publication
This document summarizes and compares several minimum-process coordinated checkpointing algorithms for mobile distributed systems. It begins with background on checkpointing techniques in distributed systems and issues specific to mobile distributed systems. It then describes five minimum-process coordinated checkpointing protocols: the Cao and Singhal algorithm, the Kumar and Kumar algorithm, the Silva and Silva algorithm, Koo-Toueg's algorithm, and the Coa and Singhal blocking algorithm. For each algorithm, it provides a brief overview of the approach and key aspects. The document aims to provide a comparative analysis of existing minimum-process checkpointing protocols for mobile distributed systems.
A comparative analysis of minimum process coordinated checkpointingiaemedu
This document summarizes and compares several minimum-process coordinated checkpointing algorithms for mobile distributed systems. It begins with background on checkpointing techniques in distributed systems and issues specific to mobile distributed systems. It then describes five minimum-process coordinated checkpointing protocols: the Cao and Singhal algorithm, the Kumar and Kumar algorithm, the Silva and Silva algorithm, Koo-Toueg's algorithm, and the Coa and Singhal blocking algorithm. For each algorithm, it provides a brief overview of the approach and key aspects. The document aims to provide a comparative analysis of existing minimum-process checkpointing protocols for mobile distributed systems.
A comparative analysis of minimum process coordinatediaemedu
This document summarizes and compares several minimum-process coordinated checkpointing algorithms for mobile distributed systems. It begins with background on checkpointing techniques in distributed systems and issues specific to mobile distributed systems. It then describes five minimum-process coordinated checkpointing protocols: the Cao and Singhal algorithm, the Kumar and Kumar algorithm, the Silva and Silva algorithm, Koo-Toueg's algorithm, and the Coa and Singhal blocking algorithm. For each algorithm, it provides a brief overview of the approach and key aspects. The document aims to provide a comparative analysis of existing minimum-process checkpointing protocols for mobile distributed systems.
A comparative analysis of minimum process coordinatediaemedu
This document summarizes checkpointing algorithms for mobile distributed systems. It discusses minimum-process coordinated checkpointing as a preferred approach. It describes asynchronous and synchronous checkpointing, and categorizes the latter into minimum-process and all-process algorithms. Minimum-process algorithms only require interacting processes to checkpoint, while all-process checkpoints all processes. Blocking and non-blocking variants are also discussed. The document focuses on challenges of mobility for traditional distributed system checkpointing algorithms.
PERFORMANCE ENHANCEMENT WITH SPECULATIVE-TRACE CAPPING AT DIFFERENT PIPELINE ...caijjournal
Simultaneous Multi-Threading (SMT) processors improve system performance by allowing concurrent execution of multiple independent threads with sharing key datapath components and better utilization of resources. Speculative execution allows modern processors to fetch continuously and reduce the delays of control instructions. However, a significant amount of resources is usually wasted due to miss-speculation, which could have been used by other valid instructions, and such a waste is even more pronounced in an SMT system. In order to minimize the waste of resources, a speculative trace capping technique [1] was proposed to limit the number of speculative instructions in the system. In this paper, a thorough analysis is given to investigate the trade-offs among applying this capping mechanism at different pipeline stages so as to maximize its benefits. Our simulations show that the best choice can improve overall system throughput by a very significant margin (up to 46%) without sacrificing execution fairness among the threads.
Computer Applications: An International Journal (CAIJ)caijjournal
Computer Applications: An International Journal (CAIJ) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science Applications. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on Computer science application advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of Computer Science Applications.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of Computer Science Applications.
PERFORMANCE ENHANCEMENT WITH SPECULATIVE-TRACE CAPPING AT DIFFERENT PIPELINE ...caijjournal
Simultaneous Multi-Threading (SMT) processors improve system performance by allowing concurrent execution of multiple independent threads with sharing key datapath omponents and better utilization of resources. Speculative execution allows modern processors to fetch continuously and reduce the delays of control instructions. However, a significant amount of resources is usually wasted due to miss- peculation,
which could have been used by other valid instructions, and such a waste is even more pronounced in an SMT system. In order to minimize the waste of resources, a speculative trace capping technique [1] was proposed to limit the number of speculative instructions in the system. In this paper, a thorough analysis is given to investigate the trade-offs among applying this capping mechanism at different pipeline stages so as
to maximize its benefits. Our simulations show that the best choice can improve overall system throughput
by a very significant margin (up to 46%) without sacrificing execution fairness among the threads.
This document discusses deadlock detection in distributed systems. It begins with defining deadlock and providing an example of a deadlock situation. It then explains that deadlock detection is more challenging in distributed systems due to factors like message loss and lack of shared memory. The document outlines three strategies for deadlock handling - detection and recovery, prevention, and avoidance. It proposes two approaches for deadlock detection in distributed systems: 1) using a central coordinator to merge wait-for graphs or 2) having all machines broadcast their wait-for graphs to detect deadlocks in a distributed way. Both approaches have drawbacks like single point of failure or overhead.
Fault recovery tactics in software systems include:
- Voting with redundant components to detect and correct faults. Diversity uses different software/hardware to detect algorithm faults.
- Active redundancy keeps redundant components synchronized in real-time, allowing recovery in milliseconds by switching to backups.
- Passive redundancy uses a primary component with backup components that are periodically resynchronized, allowing recovery in seconds.
- Using spare components requires reconfiguring software and state on the spare, increasing recovery time to minutes.
- Other tactics for recovering failed components include running them in shadow mode, resynchronizing their state, and rolling back to checkpoints.
This chapter discusses fault tolerance in distributed systems. It covers process resilience through replicating processes into groups that can tolerate failures. It also discusses reliable multicasting to synchronize processes and distributed commit protocols for ensuring atomicity. The chapter objectives are to discuss these topics as well as failure recovery through state saving. It describes different types of faults, failure modes, and how redundancy can be used to mask faults.
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughout academia organizations, business, home, and government sectors. Peer-to-peer (P2P) technology is a typical distributed system model that is gaining popularity for delivering computing resources and services. Distributed systems try to increase its availability in the event of frequent component failures and functioning the system in such scenario is notoriously difficult. In order to identify component failures in the system and achieve global agreement (consensus) among failed components, this paper implemented an efficient failure detection and consensus algorithm based on fail-stop type process failures. The proposed algorithm is fault-tolerant to process failures occurring before and during the execution of the algorithm. The proposed algorithm works with the epidemic gossip protocol, which is a randomly generated paradigm of computation and communication that is both fault-tolerant and scalable. A simulation of an extreme-scale information dissemination process shows that global agreement can be achieved. A P2P simulator, PeerSim, is used in the paper to implement and test the proposed algorithm. The proposed algorithm results exhibited high scalability and at the same time detected all the process failures. The status of all the processes is maintained in a Boolean matrix.
A New Function-based Framework for Classification and Evaluation of Mutual Ex...CSCJournals
This paper presents a new function-based framework for mutual exclusion algorithms in distributed systems. In the traditional classification mutual exclusion algorithms were divided in to two groups: Token-based and Permission-based. Recently, some new algorithms are proposed in order to increase fault tolerance, minimize message complexity and decrease synchronization delay. Although the studies in this field up to now can compare and evaluate the algorithms, this paper takes a step further and proposes a new function-based framework as a brief introduction to the algorithms in the four groups as follows: Token-based, Permission-based, Hybrid and K-mutual exclusion. In addition, because of being dispersal and obscure performance criteria, introduces four parameters which can be used to compare various distributed mutual exclusion algorithms such as message complexity, synchronization delay, decision theory and nodes configuration. Hope the proposed framework provides a suitable context for technical and clear evaluation of existing and future methods.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of user's approach to providing feedback for smart home devices. We will conduct an online survey for a sample of three to five students selected by simple random sampling to study the user's motto for giving feedback on smart home devices and their expectations. We have observed that most users are ready to actively share their input on smart home devices to improve the product's service and quality to fulfill the user’s needs and make their lives easier.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more
straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame
works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of
user's approach to providing feedback for smart home devices. We will conduct an online survey for a
sample of three to five students selected by simple random sampling to study the user's motto for giving
feedback on smart home devices and their expectations. We have observed that most users are ready to
actively share their input on smart home devices to improve the product's service and quality to fulfill the
user’s needs and make their lives easier.
October 2023-Top Cited Articles in IJU.pdfijujournal
International Journal of Ubiquitous Computing (IJU) is a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing. Current information age is witnessing a dramatic use of digital and electronic devices in the workplace and beyond. Ubiquitous Computing presents a rather arduous requirement of robustness, reliability and availability to the end user. Ubiquitous computing has received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational applications in real life. The aim of the journal is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
ACCELERATION DETECTION OF LARGE (PROBABLY) PRIME NUMBERSijujournal
This document discusses methods for efficiently generating large prime numbers for use in RSA cryptography. It presents experimental results measuring the time taken to generate prime numbers when trial dividing the starting number by different numbers of initial primes before applying the Miller-Rabin primality test. The optimal number of trial divisions can be estimated as B=E/D, where E is the time for Miller-Rabin test and D is the maximum usefulness of trial division. Experimental results on different sized numbers support dividing by around 20 initial primes as optimal.
A novel integrated approach for handling anomalies in RFID dataijujournal
Radio Frequency Identification (RFID) is a convenient technology employed in various applications. The
success of these RFID applications depends heavily on the quality of the data stream generated by RFID
readers. Due to various anomalies found predominantly in RFID data it limits the widespread adoption of
this technology. Our work is to eliminate the anomalies present in RFID data in an effective manner so that
it can be applied for high end applications. Our approach is a hybrid approach of middleware and
deferred because it is not always possible to remove all anomalies and redundancies in middleware. The
processing of other anomalies is deferred until the query time and cleaned by business rules. Experimental
results show that the proposed approach performs the cleaning in an effective manner compared to the
existing approaches.
UBIQUITOUS HEALTHCARE MONITORING SYSTEM USING INTEGRATED TRIAXIAL ACCELEROMET...ijujournal
Ubiquitous healthcare has become one of the prominent areas of research inorder to address the
challenges encountered in healthcare environment. In contribution to this area, this study developed a
system prototype that recommends diagonostic services based on physiological data collected in real time
from a distant patient. The prototype uses WBAN body sensors to be worn by the individual and an android
smart phone as a personal server. Physiological data is collected and uploaded to a Medical Health
Server (MHS) via GPRS/internet to be analysed. Our implemented prototype monitors the activity, location
and physiological data such as SpO2 and Heart Rate (HR) of the elderly and patients in rehabilitation. The
uploaded information can be accessed in real time by medical practitioners through a web application.
ENHANCING INDEPENDENT SENIOR LIVING THROUGH SMART HOME TECHNOLOGIESijujournal
The population of elderly folks is ballooning worldwide as people live longer. But getting older often
means declining health and trouble living solo. Smart home tech could keep an eye on old folks and get
help quickly when needed so they can stay independent. This paper looks at a system combining wireless
sensors, video watches, automation, resident monitoring, emergency detection, and remote access. Sensors
track health signs, activities, appliance use. Video analytics spot odd stuff like falls. Sensor fusion and
machine learning find normal patterns so wonks can see unhealthy changes and send alerts. Multi-channel
alerts reach caregivers and emergency folks. A LabVIEW can integrate devices and enables local and
remote oversight and can control and handle emergency responses. Benefits seem to be early illness clues,
quick help, less burden on caregivers, and optimized home settings. But will old folks use all this tech? Can
we prove it really helps folks live longer and better? More research on maximizing reliability and
evaluating real-world impacts is needed. But designed thoughtfully, smart homes could may profoundly
improve the aging experience.
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEijujournal
In today’s Internet world, log file analysis is becoming a necessary task for analyzing the customer’s
behavior in order to improve advertising and sales as well as for datasets like environment, medical,
banking system it is important to analyze the log data to get required knowledge from it. Web mining is the
process of discovering the knowledge from the web data. Log files are getting generated very fast at the
rate of 1-10 Mb/s per machine, a single data center can generate tens of terabytes of log data in a day.
These datasets are huge. In order to analyze such large datasets we need parallel processing system and
reliable data storage mechanism. Virtual database system is an effective solution for integrating the data
but it becomes inefficient for large datasets. The Hadoop framework provides reliable data storage by
Hadoop Distributed File System and MapReduce programming model which is a parallel processing
system for large datasets. Hadoop distributed file system breaks up input data and sends fractions of the
original data to several machines in hadoop cluster to hold blocks of data. This mechanism helps to
process log data in parallel using all the machines in the hadoop cluster and computes result efficiently.
The dominant approach provided by hadoop to “Store first query later”, loads the data to the Hadoop
Distributed File System and then executes queries written in Pig Latin. This approach reduces the response
time as well as the load on to the end system. This paper proposes a log analysis system using Hadoop
MapReduce which will provide accurate results in minimum response time.
SERVICE DISCOVERY – A SURVEY AND COMPARISONijujournal
The document summarizes and compares several major service discovery approaches. It provides an overview of service discovery objectives and techniques, then surveys prominent protocols including SLP, Jini, and UPnP. Each approach is analyzed based on features like service description, discovery architecture, announcement/query mechanisms, and how they handle service usage and dynamic network changes. The comparison aims to identify strengths and limitations to guide future research in improving service discovery.
SIX DEGREES OF SEPARATION TO IMPROVE ROUTING IN OPPORTUNISTIC NETWORKSijujournal
Opportunistic Networks are able to exploit social behavior to create connectivity opportunities. This
paradigm uses pair-wise contacts for routing messages between nodes. In this context we investigated if the
“six degrees of separation” conjecture of small-world networks can be used as a basis to route messages in
Opportunistic Networks. We propose a simple approach for routing that outperforms some popular
protocols in simulations that are carried out with real world traces using ONE simulator. We conclude that
static graph models are not suitable for underlay routing approaches in highly dynamic networks like
Opportunistic Networks without taking account of temporal factors such as time, duration and frequency of
previous encounters.
International Journal of Ubiquitous Computing (IJU)ijujournal
International Journal of Ubiquitous Computing (IJU) is a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing. Current information age is witnessing a dramatic use of digital and electronic devices in the workplace and beyond. Ubiquitous Computing presents a rather arduous requirement of robustness, reliability and availability to the end user. Ubiquitous computing has received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational applications in real life. The aim of the journal is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
PERVASIVE COMPUTING APPLIED TO THE CARE OF PATIENTS WITH DEMENTIA IN HOMECARE...ijujournal
The aging population and the consequent increase in the incidence of dementias is causing many
challenges to health systems, mainly related to infrastructure, low services quality and high costs. One
solution is to provide the care at house of the patient, through of home care services. However, it is not a
trivial task, since a patient with dementia requires constant care and monitoring from a caregiver, who
suffers physical and emotional overload. In this context, this work presents an modelling for development of
pervasive systems aimed at helping the care of these patients in order to lessen the burden of the caregiver
while the patient continue to receive the necessary care.
A proposed Novel Approach for Sentiment Analysis and Opinion Miningijujournal
as the people are being dependent on internet the requirement of user view analysis is increasing
exponentially. Customer posts their experience and opinion about the product policy and services. But,
because of the massive volume of reviews, customers can’t read all reviews. In order to solve this problem,
a lot of research is being carried out in Opinion Mining. In order to solve this problem, a lot of research is
being carried out in Opinion Mining. Through the Opinion Mining, we can know about contents of whole
product reviews, Blogs are websites that allow one or more individuals to write about things they want to
share with other The valuable data contained in posts from a large number of users across geographic,
demographic and cultural boundaries provide a rich data source not only for commercial exploitation but
also for psychological & sociopolitical research. This paper tries to demonstrate the plausibility of the idea
through our clustering and classifying opinion mining experiment on analysis of blog posts on recent
product policy and services reviews. We are proposing a Nobel approach for analyzing the Review for the
customer opinion
International Journal of Ubiquitous Computing (IJU)ijujournal
International Journal of Ubiquitous Computing (IJU) is a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing. Current information age is witnessing a dramatic use of digital and electronic devices in the workplace and beyond. Ubiquitous Computing presents a rather arduous requirement of robustness, reliability and availability to the end user. Ubiquitous computing has received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational applications in real life. The aim of the journal is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
USABILITY ENGINEERING OF GAMES: A COMPARATIVE ANALYSIS OF MEASURING EXCITEMEN...ijujournal
Usability engineering and usability testing are concepts that continue to evolve. Interesting research studies
and new ideas come up every now and then. This paper tests the hypothesis of using an EDA-based
physiological measurements as a usability testing tool by considering three measures; which are observers‟
opinions, self-reported data and EDA-based physiological sensor data. These data were analyzed
comparatively and statistically. It concludes by discussing the findings that has been obtained from those
subjective and objective measures, which partially supports the hypothesis.
SECURED SMART SYSTEM DESING IN PERVASIVE COMPUTING ENVIRONMENT USING VCSijujournal
Ubiquitous Computing uses mobile phones or tiny devices for application development with sensors
embedded in mobile phones. The information generated by these devices is a big task in collection and
storage. For further, the data transmission to the intended destination is delay tolerant. In this paper, we
made an attempt to propose a new security algorithm for providing security to Pervasive Computing
Environment (PCE) system using Public-key Encryption (PKE) algorithm, Biometric Security (BS)
algorithm and Visual Cryptography Scheme (VCS) algorithm. In the proposed PCE monitoring system it
automates various home appliances using VCS and also provides security against intrusion using Zigbee
IEEE 802.15.4 based Sensor Network, GSM and Wi-Fi networks are embedded through a standard Home
gateway.
PERFORMANCE COMPARISON OF ROUTING PROTOCOLS IN MOBILE AD HOC NETWORKSijujournal
Routing protocols have an important role in any Mobile Ad Hoc Network (MANET). Researchers have
elaborated several routing protocols that possess different performance levels. In this paper we give a
performance evaluation of AODV, DSR, DSDV, OLSR and DYMO routing protocols in Mobile Ad Hoc
Networks (MANETS) to determine the best in different scenarios. We analyse these MANET routing
protocols by using NS-2 simulator. We specify how the Number of Nodes parameter influences their
performance. In this study, performance is calculated in terms of Packet Delivery Ratio, Average End to
End Delay, Normalised Routing Load and Average Throughput.
The document compares the performance of various optical character recognition (OCR) tools. It analyzes eight OCR tools - Online OCR, Free Online OCR, OCR Convert, Convert image to text.net, Free OCR, i2OCR, Free OCR to Word Convert, and Google Docs. The document provides sample outputs of each tool processing the same input image. It then evaluates the tools based on character accuracy, character error rate, special symbol accuracy, and special symbol error rate to determine which tools most accurately convert images to editable text.
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text
format. Many different types of Optical Character Recognition (OCR) tools are commercially available
today; it is a useful and popular method for different types of applications. OCR can predict the accurate
result depends on text pre-processing and segmentation algorithms. Image quality is one of the most
important factors that improve quality of recognition in performing OCR tools. Images can be processed
independently (.png, .jpg, and .gif files) or in multi-page PDF documents (.pdf). The primary objective of
this work is to provide the overview of various Optical Character Recognition (OCR) tools and analyses of
their performance by applying the two factors of OCR tool performance i.e. accuracy and error rate.
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2Rijujournal
In multi-radio wireless mesh networks, one node is eligible to transmit packets over multiple channels to
different destination nodes simultaneously. This feature of multi-radio wireless mesh network makes high
throughput for the network and increase the chance for multi path routing. This is because the multiple
channel availability for transmission decreases the probability of the most elegant problem called as
interference problem which is either of interflow and intraflow type. For avoiding the problem like
interference and maintaining the constant network performance or increasing the performance the WMN
need to consider the packet aggregation and packet forwarding. Packet aggregation is process of collecting
several packets ready for transmission and sending them to the intended recipient through the channel,
while the packet forwarding holds the hop-by-hop routing. But choosing the correct path among different
available multiple paths is most the important factor in the both case for a routing algorithm. Hence the
most challenging factor is to determine a forwarding strategy which will provide the schedule for each
node for transmission within the channel. In this research work we have tried to implement two forwarding
strategies for the multi path multi radio WMN as the approximate solution for the above said problem. We
have implemented Global State Routing (GSR) which will consider the packet forwarding concept and
Aggregation Aware Layer 2 Routing (AAL2R) which considers the both concept i.e. both packet forwarding
and packet aggregation. After the successful implementation the network performance has been measured
by means of simulation study.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Choosing The Best AWS Service For Your Website + API.pptx
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
1. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
DOI:10.5121/iju.2015.6403 28
CHECKPOINTING WITH MINIMAL RECOVERY IN
ADHOCNET BASED TMR
Sarmistha Neogy
Department of Computer Science & Engineering, Jadavpur University, India
Abstract:
This paper describes two-fold approach towards utilizing Triple Modular Redundancy (TMR) in Wireless
Adhoc Network (AdocNet). A distributed checkpointing and recovery protocol is proposed. The protocol
eliminates useless checkpoints and helps in selecting only dependent processes in the concerned
checkpointing interval, to recover. A process starts recovery from its last checkpoint only if it finds that it is
dependent (directly or indirectly) on the faulty process. The recovery protocol also prevents the occurrence
of missing or orphan messages. In AdocNet, a set of three nodes (connected to each other) is considered to
form a TMR set, being designated as main, primary and secondary. A main node in one set may serve as
primary or secondary in another. Computation is not triplicated, but checkpoint by main is duplicated in its
primary so that primary can continue if main fails. Checkpoint by primary is then duplicated in secondary
if primary fails too.
Keywords:
checkpointing, dependency tracking, rollback recovery, adhoc networks, triple modular redundancy
1. INTRODUCTION
Distributed systems that execute processes on different nodes connected by a communication
network [6] are prone to failure. One of the widely used approaches for providing fault tolerance
is the checkpoint/rollback recovery mechanism. Checkpointing is the method of periodically
recording the state of the system in stable storage. The saving of process state information may be
required for error recovery, debugging and other distributed applications [7]. This periodically
saved state is called the checkpoint of the process [7, 8]. A global state [22] is a set of individual
process states, one per process [7]. The state contains a snapshot at some instant during the
execution of a process. The snapshot is required to be consistent to avoid the domino effect [23]
that is, multiple rollbacks during recovery.
One of the most well-known methods of achieving fault tolerance is Triple Modular Redundant
(TMR) [25] system. A minimum of three processors also known as replicas form a redundant
group to perform replicated processing. Identical processing and distributed voting are performed
on same input data. Intermediate result or output from each replica is exchanged among each
other and majority voted upon. After successful majority voting the replicas either resume
processing on the intermediate results or end their computation if it had been the final result.
Communications among the replicas take place via communication links. Replica at the receiving
end has to wait for a time-out period [26] for receiving before concluding that there may be fault
2. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
30
in other part of the system. This concept of TMR is utilized in this work as a measure for
achieving fault tolerance in a wireless adhoc network (AdhocNet) where a group of three nodes,
known as mobile hosts (MH) form the three replicas. Also fault tolerance may be achieved by
periodically using stable storage of the MHs to save the process‘ states, better known as
checkpoints, during failure-free execution. When a failure occurs, the failed process restarts from
its latest checkpoint. This minimizes the amount of lost computation. The proposed system is
recoverable even if more than one failure (at most two) occurs in a TMR node. As is well known
wireless adhoc network does not have any infrastructure facilities and hence each MH acts as
router also. The concept of distributed systems is extended to the wireless environment.
A TMR group consists of MHs that act as main, primary and secondary. The TMR groups are not
exclusive, that is, MH acting as main may act as primary in another TMR group and so on. The
concept of TMR is modified here in the sense that the three MHs do not perform identical
processing throughout. But the checkpoint taken by a main MH is replicated in its primary MH.
This is because in case a main MH fails, the primary MH can continue from the latest checkpoint.
Similarly, the secondary MH receives a copy of the checkpoint every time the primary MH takes
one. This continues until the primary MH fails. The communicating partners of this TMR group
however are unaware of this change in the actual partner at the other end. It is assumed that
several processes that are running on the MHs may communicate with each other depending on
application requirement.
In the present work checkpointing is initiated by a process of the system. In fact, each of the
processes takes turn to act as the initiator. Generally, processes take local checkpoints after being
notified by the initiator excepting special cases described in later sections. The processes
synchronize their activities of the current checkpointing interval before finally committing their
checkpoints. This removes inconsistency, if any, and then checkpoints are committed. The
technique adopted in the present paper thus disallows the formation of neither zigzag paths nor
zigzag cycle [4,11]. The checkpointing pattern described in the present paper takes only those
checkpoints that will contribute to a consistent global snapshot thereby eliminating the number of
―useless‖ (checkpoints that do not contribute to global consistency) checkpoints. Maintaining
consistency is necessary to avoid the domino effect in case any process fails after taking its ith
checkpoint. If the set of the ith checkpoints can be proved to be consistent, then in case of
recovery the system has to roll back only up to the ith checkpoint since that set provides a
consistent and hence a stable global state of the system.
The processes do not append status information with each and every computation message but
keeps updating own status whenever a message is sent or received. This information is required to
find process dependence during recovery. Though the simplest way is to roll back all processes
but this makes some unnecessary rollbacks. To avoid such rollbacks the processes in the present
system exchange status information whenever a rollback is decided. Each process can find out for
itself whether it requires rollback depending upon its relationship with the failed process.
This paper describes that any global checkpoint taken in the above-mentioned fashion in the
present system is not only consistent but also eliminates taking unnecessary checkpoints and the
system has to roll back only to the last saved state in case of a failure. Also all processes in the
system do not have to rollback following the rollback algorithm described in the paper. The rest
of the paper is organized as follows. Section 2 throws light on some related works in this area.
3. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
31
Section 3 describes the system model, Section 4 discusses in details the checkpointing algorithm
with a proof, Section 5 discusses the rollback procedure and the algorithm, Section 6 discusses
integration of the activities of TMR AdhocNet and the last one, that is, Section 7 concludes the
paper.
2. RELATED WORKS
With reference to Chandy & Lamport [1] and Wang et. al [24] Tsai & Kuo [23] states that "A
global checkpoint M is consistent if no message is sent after a checkpoint of M and received
before another checkpoint of M". Following these observations we regard consistency as the
scenario where if a sender 'S' sends a message 'm' before it has taken its ith checkpoint, then
message 'm' must have to be received by a receiver 'R' before the receiver has taken its ith
checkpoint. A message will be termed missing if its send is recorded but receipt is not and
otherwise it is termed as orphan [21]. Suppose a node fails after taking its ith checkpoint. It is
desirable that the system in such a scenario should roll back to the last (ith) saved state and
resume execution from there. If a system can ensure that there is no missing or orphan message in
the concerned ith global checkpoint, then the set of all the ith checkpoints taken by its constituent
processes is bound to be consistent. Unlike the approach that should exist in a distributed system
Kalaiselvi and Rajaraman [5] have kept record at the message sending end and at the message
receiving end and a checkpoint coordinator matches the log it gets from all the processes at each
checkpointing time. The present system also keeps records of messages sent and received in each
process but the log is matched in a distributed fashion. Due to disparity in speed or congestion in
the network, a message belonging to (i+1)th checkpointing interval may reach its receiver who
has not yet taken its ith checkpoint. Such a message is discarded in [21] and sender retransmits it.
Another method of dealing with such messages is to prevent their occurrences by compelling the
sender to wait for a certain time before sending a message after any checkpoint [13]. The present
work discards such a message by adopting a technique in receiving whereas in another approach
[11] any process refrains from sending during the interval between the receipt of checkpoint
initiation message and completion of committing that checkpoint. Distributed systems that use the
recovery block approach [17] and have a common time base may estimate a time by which the
participating processes would take acceptance tests. These estimated instants form the pseudo
recovery point times as described in [16]. The disadvantages of such a scheme are more than one,
like, fast processes may have to wait for slow processes to catch up and other fault tolerance
mechanisms like time out may be required. In [9,10] the authors have analyzed checkpoints taken
in a distributed system having loosely synchronized clocks [13,14,18,19]. No special
synchronization messages have been used in those methods but the existing clock synchronization
messages were utilized. The work described in [4] however, allows processes to take checkpoints
on one‘s own and then a consistent global checkpoint is constructed from the set of local
checkpoints. The drawback of the method is that useless checkpoints can not be avoided. The
approach taken by Strom et al. in [20] does not maintain a consistent global checkpoint at all
times but has to save enough information to construct such a checkpoint when need arises. So,
this requires logging of messages. Contrary to the present checkpointing protocol, The authors in
[2,15] presents minimal snapshot collection protocol where dependency is calculated during
checkpointing also and hence the actual time taken for formal commitment or abort of a
checkpoint is not fixed. The concept of weight distribution and collection by the initiator in [2,15]
appears superfluous and can be replaced if a participating process sends a list of processes
dependent on it to the initiator. The overhead of checkpointing (in terms of the number of
4. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
32
checkpoints) is great in all of the CAS, CBR, CASBR and NRAS [8] protocols in comparison to
the protocol presented in this paper. The present protocol possesses the Rollback-Dependency-
Trackability (RDT) property as described in [22] as shown in Section 5.
A few checkpointing recovery techniques for mobile computing systems (infrastructured wireless
and mobile networks) are described in the literature. In the two-tier checkpointing approach [27],
coordinated checkpointing is used between the Mobile Service Stations (MSS) to reduce the
number of checkpoints to be stored to a minimum. In [28] the log of unacknowledged messages
are kept at stable storage of the home station (that is an MSS) of the mobile host. Gass and Gupta
[29] in their algorithm takes three kinds of checkpoints—communication induced (taken after
receiving an application message), local checkpoint (when an MH leaves the MSS to which it is
connected to) and forced checkpoints (only the local variables are updated). All applications are
assumed to be blocked during the algorithm execution thereby wasting computation power and
information that failure has occurred is assumed to reach all fault free processes within finite time
which is difficult in reality. This algorithm saves battery power by minimizing the recomputation
time.
The work in [30] describes checkpointing and recovery using TMR in wireless infrastructured
network. The authors have described a checkpointing and recovery protocol for infrastructured
mobile system in [31]. The authors in [32] utilized the concept of mobile agents for checkpointing
purposes in mobile systems. The works in [31] and [32] utilized different approaches towards
checkpointing for infrastructure mobile systems. The authors have considered an attack model
and augmented Mobile Adhoc Network with security features in their work in [33]. The work in
[33] has enabled us to consider any AdhocNet routing algorithm for the present work.
3. SYSTEM MODEL AND ASSUMPTIONS
Let us consider a system of ‗n‘ processes, P0, P1, …. Pn-1. Let the checkpoints (for the kth process)
be denoted as the initial checkpoint CPk
0
(i = 0), first checkpoint CPk
1
(i=1), second checkpoint
CPk
2
(i=2) and so on. The time interval between any two consecutive checkpoints is called
checkpointing interval that is eventually the next checkpoint number. This means that, the first
checkpointing interval is the interval between the initial checkpoint and the first checkpoint. The
initial checkpoint is taken when the system is being initialized. The processes communicate via
messages only. We assume the following properties of the system:
1. Initiation of checkpointing at regular intervals is done by processes. The initial checkpoint is
taken upon system initialization and initiated by P0. The next checkpoint initiation is done by
P1 and so on and so forth.
2. Asynchronous communication has been assumed among the processes. Acknowledgement
and time-out are part of the communication protocol.
3. A process is aware of the TMR group it belongs to and its role in that group.
4. Any AdhocNet routing algorithm may be used.
5. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
33
4. CHECKPOINTING
4.1 The Algorithm
The algorithm has a checkpoint initiator and uses explicit checkpoint synchronization messages.
The initiator sends the initiation message to all others along with further information: number of
messages sent to processes in the current checkpointing interval and number of messages
received from processes in the current checkpointing interval. It must be mentioned here that the
additional information regarding messages would not be sent during the initial checkpoint since it
is taken just after the system has been initialized and hence it is assumed that communication
among processes has not yet started. The information means that if Pk has sent a total of two
messages to Pj in the current checkpointing interval, then Pk would write 2 as number of messages
and j as process id as part of the first information. Similarly if Pj has indeed received all the two
messages from Pk it would write 2 as number of messages and k as process id as part of the
second information. Pj checks whether the total number of messages sent by Pk matches with that
received by Pj. If the answer is positive, Pj takes the checkpoint. If not, then it waits for the
unreceived message/s and takes the checkpoint after receiving it/them. During this time only
those messages are received for which Pj is waiting and any unwanted message is discarded [20].
The algorithm works as follows:
The initial checkpoint is taken after system initialization in lines 7-10 (for the initiator) and lines
14-16 (for other processes) in algorithm1. For any other checkpoint, the initiator first sends a
―request for checkpoint‖ message followed by a message containing its status information for the
current checkpointing interval (lines 11-12). Any other process on receiving the above (lines 16-
17) sends its own status information to all other processes (line 18) and waits for receiving such
information from the others (line 19). After it receives status information from others it goes on to
check whether there is any message that has been sent by some other process to it but not yet
received by it (lines 20-23). It waits to receive the said message/s and then takes the checkpoint
(lines 24-25).
The variables used in the algorithm are described as follows:
initiator: pid of checkpoint initiator
check_index: checkpoint sequence number
own_pid: self process id
msg_type: denotes a tag for identifying various kinds of messages:
0: checkpoint-request message
1: process-status-information message
any other: computation message
mess_sent_toi[j]: an array; values of whose indices denote the number of messages
that the concerned process has sent to (i.e. if the value in
mess_sent_toi[j] is n, this means that Pi (concerned process) has sent n
messages to Pj in the current checkpointing interval)
mess_recd_fmi[j]: an array; values of whose indices denote the number of messages that the
concerned process has received from (i.e. if the value in
mess_recd_fmi[j] is n, this means that Pi (concerned process) has
received n messages from Pj in the current checkpointing interval)
6. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
34
The subroutines used in the algorithm are as follows:
send, receive: communication primitives
take_checkpoint: saves process state
recv_sp: for executing the receive communication primitive with added
logic like checking message type, message sequence number or
even the sender etc
send_sp: for executing the send communication primitive with added logic
like checking message type, message sequence number or even the
receiver etc
The structure of the checkpointing algorithm is given below along with the line numbers
mentioned in the leftmost column:
1 Procedure Checkpoint(Pi)
2 {
3 initiator := 0;
4 check_index := 0;
5 dest_id := -1;
6 if (initiator = own_pid)
7 if (check_index = 0)
8 { msg_type = 0;
9 send_sp(msg_type, dest_id, check_index, seq_no);
10 take_checkpoint; }
11 else{msg_type:= 0; send_sp(msg_type,dest_id, check_index,seq_no);
12 msg_type := 1; send_sp(msg_type,dest_id, check_index, seq_no);}
13 else if (check_index = 0) {
14 recv_sp(recd_msg_type, send_id, recd_check_index, seq_no);
15 take_checkpoint; }
16 else{recv_sp(recd_msg_type,send_id, recd_check_index,seq_no);
17 recv_sp(recd_msg_type, send_id, recd_check_index, seq_no);
18 send_sp(recd_msg_type, dest_id, check_index, seq_no);
19 recv_sp(recd_msg_type, send_id, recd_check_index, seq_no);
20 for (i = 0; i<= n-1, i++)
21 for (j = 0; j <= n-1 , j++) {
22 if (i j) {
23 if (mess_sent_toi[j] mess_recd_fmj[i])
24 if (own_pid = j)
25 recv_sp(recd_msg_type,send_id,recd_check_index,seq_no);}
26 take_checkpoint; }
27 check_index := check_index + 1; }
The algorithm recv_sp works as follows: In lines 2 – 6 it receives only those messages
whose checkpoint numbers equal to that of the receiver‘s checkpoint number and
message sequence number matches the expected message sequence. In lines 7 – 9 the
algorithm receives the checkpoint initiation messages. In lines 10 – 12 the algorithm
receives messages from other processes containing corresponding status information.
1 Procedure recv_sp(mtype,pid, checkid, seqno)
2 {
3 If ((mtype <> 0) OR (mtype <>1))
7. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
35
4 { If ((checkid = check_index) AND((seqno<= (mess_sent_toi[j])
5 AND ((seqno >= (mess_recd_fmj[i] + 1))) )
6 receive(rmtype, pid, check_id, r_seq, recv_mess) }
7 else ( if ( mtype = 0)
8 if (checkid = 0) receive(rmtype, checkid ) from P0 ;
9 else receive(rmtype, checkid);
10 else (if ( ( mtype = 1) AND (checkid <> 0) AND (pid = -1) )
11 for (k = 0; ((k<=n) AND (k <> own_pid)); k++)
12 receive(rmtype,checkid, mess_sent_to, mess_recd_fm);
13 } }
The algorithm send_sp works as follows: In lines 2 – 4 it sends computation messages
and in lines 5 – 8 it sends own status information to all others.
1 Procedure send_sp(mtype,pid, checkid, seqno)
2 {
3 if (((mtype <> 0) OR (mtype <>1)) AND (pid <> -1))
4 {send(mtype, pid, checkid,seqno, mess) }
5 else
6 if ( (( mtype = 0) OR (mtype = 1))
7 { for (k = 0; ((k<=n) AND (k <> own_pid)); k++)
8 send(mtype, checkid, mess_sent_to, mess_recd_fm);
9 } }
4.2 Brief Proof
Theorem:
The checkpoints taken by the algorithm form a consistent global checkpoint.
Proof: The theorem is proved by contradiction. Let the checkpoints form an inconsistent global
checkpoint. Then there should be a checkpoint CPi
k
that happens before [1] another checkpoint
CPj
k
. This implies that (i) there is at least a message m sent by Pi after CPi
k
but received by Pj
before CPj
k
and (ii) there is at least a message m sent by Pi before CPi
k
but received by Pj after
CPj
k
. This can be proved in the following way:
It must be mentioned here that, case ii stated above does not make CPi
k
happen before CPj
k
.
Hence it is not mandatory that messages recorded ―sent‖ in CPi
k
should also have to be recorded
―received‖ in CPj
k
.
Pr t3 t4
CPr
k
m
CPs
k
Ps t1 t2
Figure 1. Message recorded received and not sent
8. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
36
Case (i): Let us consider figure 1 and a fault-free scenario where messages reach destinations
correctly.
Assumptions:
1. Message m not recorded sent
2. Message m recorded received
The following scenario is observed:
i. Message m is sent at t2
ii. Message m is received at t3
iii. Checkpoint CPs
k
of Ps is taken at t1 (t1 < t2 by assumption 1)
iv. Checkpoint CPr
k
of Pr is taken at t4 (t3 < t4 by assumption 2)
v. Since Ps takes checkpoint at t1 (by assumption 1 and step iii)
a. Ps has reached line 26 of algorithm via lines 16-25.
b. Ps has checked its consistency with other (n-1) processes including Pr in lines 18 –
25.
vi. In line 18 Ps sends its status and Pr receives it in line19 in Pr‘s algorithm
a. Pr is in lines 19-25 and no discrepancies are noted.
b. Therefore, Pr reaches line 26 and hence takes checkpoint CPr
k
. (by iv) thereby
violating assumption 2 and scenario ii and iv.
c. Message m reaches Pr and eventually gets rejected in lines 4-5 of procedure recv_sp
of Pr since m carries a later checkpoint index (by i and iii).
d. vi (b, c) contradicts assumption 2.
Thus, there can not be any message m that is not recorded sent but recorded received in the same
global checkpoint.
Alternative Proof:
With assumptions remaining the same, the following scenario is observed:
i. Message m is sent at t2
ii. Message m is received at t3
iii Checkpoint CPs
k
of Ps is taken at t1 (t1 < t2 by assumption 1)
iv. Checkpoint CPr
k
of Pr is taken at t4 (t3 < t4 by assumption 2)
v. Assuming Pr takes checkpoint at t4 (by assumption 2 and step iv)
a. Pr has checked its consistency with other (n-1) processes including Ps in lines 18 – 25
thereby confirming that all messages sent by Ps have been received by Pr and vice
versa.
b. Pr has reached line 26 and taken checkpoint via lines 16-25.
c. v (a, b) contradicts assumption 1.
Thus, there can not be any message m that is not recorded sent but recorded received in the same
global checkpoint.
Case (ii) : Let us consider figure 2.
9. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
37
Pr t1 t3
CPr
k
m
Ps CP s
k
t2 t4
Figure 2. Message recorded sent and not received
Assumptions: 1. Message recorded sent. 2. Message not recorded received.
The following scenario is observed:
i. Message m is sent at t2
ii. Ps takes checkpoint at t4 (t2 < t4 by assumption 1)
iii. Pr takes checkpoint at t1
iv. Message m is received at t3 (t1 < t3 by assumption 2)
v. Assuming Ps takes checkpoint at t4
a. Ps reaches line 26 (and records sending of m (by (ii))) via lines 16 – 25.
b. Ps has checked its consistency with other (n-1) processes including Pr in lines 18 –
25.
vi. Similarly, when Pr takes checkpoint at t1
a. Pr reaches line 26 via lines 16 – 25.
b. Pr has checked its consistency with other (n-1) processes including Ps in lines 18 –
25.
c. Pr finds that message m from Ps is yet to be received by it (by iv)
d. Pr is in line 25 via lines 20 – 24 until m is actually received in line 24.
e. Pr can not reach line 26 and hence can not take checkpoint.
f. vi (e) contradicts assumption 2.
Hence, there can not be any message that is recorded ―sent‘ but not recorded ―received‖ in the
present checkpointing protocol.
5. RECOVERY
5.1 Approach to Recovery
Whenever consensus about the failure of a process is reached, it is also decided that processes
should rollback in order to restart from the last saved consistent state. Since not all processes are
dependent on the failed process in the concerned checkpointing interval so all of them need not
roll back. The processes that communicated with the failed process should roll back and they are
termed as being ―directly‖ dependent on the failed process. Still there are others who have
communications with the directly dependent processes. Hence recovery of the directly dependent
processes would affect these ―indirectly‖ dependent processes. So, they have to roll back also.
The task of finding whether a process is indirectly dependent on the faulty process has been taken
10. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
38
up using several methods in the literature. The technique pursued here is described with the help
of an example for better understanding.
Let us assume that in a system of 5 processes process2 is found to be faulty. The vectors used in
the checkpointing algorithm that save (i) messages sent to and (ii) messages received from are
sent by each process to all others. Let us consider figure 3 below and construct the above-said
vectors for all the five processes.
P0 CP
CP a
P1
b
P2 CP
CP
P3
CP c
P4
Figure 3. A scenario of process interactions via messages
The CP indicates the last consistent checkpoint of each process. represents the point where
failure is detected. The entry ―-1‖ is used to denote end.
Process id Message sent
to (pid)
Message received
from (pid)
0 -1 1, -1
1 0, 2, -1 -1
2 -1 1, -1
3 4, -1 -1
4 -1 3, -1
After the above vectors are available, each process builds the ―sr‖ data structure (an array used in
the Detect_Recovery algorithm) in a distributed fashion. In each process the ―sr‖ array looks like
the following array:
Process id Message sent to (pid)/
Message received from (pid)
0 1, -1
1 0, 2, -1
2 1, -1
3 4, -1
4 3, -1
11. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
39
Let us now find out the dependency of each of the processes: (considering the faulty process id to
be 2)
P0:
Searches own sr entry to find if 2 exists there.
Since 2 is not there, so search the entry of sr[1] since 1 occurs in P0‘s sr-entry. P0 keeps track
that it has searched its own sr-entry.
Gets 2 in sr[1].
Concludes that P0 has to roll back.
P1:
Searches own sr entry to find if 2 exists there.
Since 2 is there, concludes that P1 has to roll back.
P2:
Searches own sr entry to find if 2 exists there.
Since 2 is not there, so search the entry of sr[1] since 1 occurs in P2‘s sr-entry. P2 keeps track
that it has searched its own sr-entry.
Gets 2 in sr[1].
Concludes that P2 has to roll back.
P3:
Searches own sr entry to find if 2 exists there.
Since 2 is not there, so search the entry of sr[4] since 4 occurs in P3‘s sr-entry. P3 keeps track
that it has searched its own sr-entry.
Since 2 is not in sr[4], so search the entry of sr[3] since 3 occurs in P4‘s sr-entry. P3 keeps
track that it has searched sr-entry of 4.
sr[4] contains only 3 whose sr-entry has already been searched and 2 was not in sr[3].
Concludes that P3 does not have to roll back.
P4:
Searches own sr entry to find if 2 exists there.
Since 2 is not there, so search the entry of sr[3] since 3 occurs in P4‘s sr-entry. P4 keeps track
that it has searched its own sr-entry.
sr[3] contains only 4 whose sr-entry has already been searched and 2 was not there.
Concludes that P4 does not have to roll back.
Data structures used in the Algorithm for detecting recovery:
sr[n][n]: This array is constructed in each process after it gets the send-receive vectors of all
other processes. This array denotes the pids of processes to/from which a particular process Pi (i
<= n) has sent/received message during the current checkpointing interval. If process P1 has sent
messages to processes P6, P2, P0 and has received messages from P4 and P3, then sr[1][n] will
contain the elements as mentioned below:
6 2 0 4 3 -1
12. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
40
The –1 in sr[1][5] indicates that valid data for row 1 ends.
depends[]: This vector contains the process ids whose send-receive vectors have been checked by
a process to find whether it is dependent on the failed process. The end of this vector is indicated
by –1.
5.2 Algorithm for Detecting Recovery
1 Procedure Detect_recovery(Pi)
2 {
3 k := 0;
4 flag, flag1 := F;
5 while (sr[ownpid][k] = pid_faulty)
6 { flag := T; recover(Pi); }
7 k1, v1 := 0;
8 while (NOT flag1)
9 { key := sr[ownpid][k1];
10 if (key == -1)
11 flag1 := T;
12 else
13 { k2 := 0;
14 while ((sr[key][k2]<> pid_faulty) OR(sr[key][k2] <> -1))
15 k2 := k2 + 1;
16 if(sr[key][k2] == pid_faulty)
17 { flag := T; recover(Pi,1); }
18 else
19 { depends[v1] := key; v1 := v1 + 1;
20 depends[v1] := -1; k1 := k1 + 1; }
21 flag2, flag3 := F; key1, k4 := 0;
22 if (flag1)
23 {
24 while (NOT flag2)
25 { k3 := 0;
26 while (NOT flag3)
27 { key2 := sr[depends[key1]][k4];
28 if ((key2 <> ownpid) OR (key2 <> -1))
29 {while((sr[key2][k3]<>pid_faulty)OR(sr[key2][k3]<>-1))
30 k3 := k3 + 1;
31 if (sr[key2][k3] == pid_faulty)
32 { flag,flag2,flag3 := T; recover(Pi,1); }
33 else
34 { j := 0;
35 while ((depends[j] <> key2) OR (depends[j] <> 1))
36 j := j + 1;
37 if (depends[j] == -1)
38 { j := j – 1; depends[j] := key2;
39 j := j + 1; depends[j] := -1; }
40 k4 := 0; }
41 else { if (key2 == -1)
42 { key1 := key1 + 1; flag3 := T; k4 := 0; }
43 else { k4 := k4 + 1; flag3 := T; }
44 }
45 }
13. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
41
46 if (depends[key1] == -1) flag2 := T;
47 }
48 if (NOT flag)
49 recover (Pi,-1);
The concept of dependency is used in the above algorithm for recovery to minimize the number
of nodes that roll back their computation. Only those nodes that have a dependency on the failed
node since the latter node‘s last checkpoint is required to roll back to maintain global consistency.
After the nodes roll back to their last saved consistent state, they have to retrace their computation
that has been undone due to rollback. Types of messages that have to be handled are:
1. Orphan messages: This situation will arise when the sender rolls back to a state prior to
sending while the receiver still has the record of its reception. However these messages can
not arise because whenever sender Pi rolls back, receiver Pj also rolls back because by the
above algorithm Pj becomes dependent on Pi.
2. Lost messages: This situation will arise when the receiver rolls back to a state prior to
reception of a message that is being still recorded as sent by the sender. However these
messages can not arise because whenever receiver Pi rolls back, sender Pj also rolls back
because by the above algorithm Pj becomes dependent on Pi.
Since the above algorithm considers both the ―send‖ as well as the ―receive‖ vectors of a process
in calculating dependency, so logging of messages by sender is not necessary as was the case in
Prakash et. al [14].
6. WORKING OF ADHOCNET-BASED TMR
The above sections 3 and 4 describe the working of the checkpointing and the recovery protocols.
This section describes the working of the TMR in AdocNet. Let us consider the following figure
4 that depicts an AdocNet and the various TMR groups it has. The network has 6 MHs with
communication links as shown.
N4
N1 N3
N5
N2 N6
Figure 4. Example Wireless Adhoc Network
Let us consider the following wrt figure 4.
TMR group1 or TMR1: N1 (main), N2 (primary) and N3 (secondary)
TMR group2 or TMR2: N3 (main), N4 (primary) and N5 (secondary)
TMR group3 or TMR3: N4 (main), N5 (primary) and N6 (secondary)
And so on.
14. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
42
Each MH knows its role in the group and also about the other two MHs belonging to its group.
Hence, N3 is aware that it acts as secondary in TMR1 and as main in TMR2. The responsibilities
associated with the roles are different.
6.1 Checkpointing
The main MHs of each TMR are the only initiators in a group and can initiate checkpointing.
Hence N1 and N3 and N4 are the initiators here. According to the checkpointing algorithm
described in the above section, checkpoint requests are received by all MHs and checkpoints
taken accordingly depending on the activities in the current checkpointing interval. The MHs
execute processes independently and the processes exchange messages frequently. The message
exchange builds a dependence relation among them. A process executing on N1 may also send
message to N3, belonging to the same TMR. Since any MH may take checkpoint in a particular
checkpointing interval, the copies of its checkpoint are to be kept in the main and in the primary
of that TMR. Whenever an MH is either main or primary, only one copy has to be sent to the
other. But if an MH is secondary, then a copy each is sent to its main and primary. This may be
an overhead in the network. However if the checkpointing interval can be chosen judiciously, this
extra circulation of checkpoints would not be that much of an overhead. Whenever a new
checkpoint is to be stored, the previous one is deleted in the corresponding main or primary. In
case of failure of any one of the MHs in a TMR, that TMR reduces to Dual Modular Redundancy
(DMR). In that case copies of checkpoints are with both the MHs in that group.
6.2 Recovery
Once the failed MH is identified (possibly after some time-out since message sending and non-
receipt of acknowledgement), the processes in the system go to the recovery mode and exchange
status information with each other. According to the recovery algorithm described above, a
process is able to identify whether it should recover or not. It then proceeds to collect its
checkpoint if it is secondary, otherwise the checkpoint is with itself only. Role change will
happen to an MH if any other MH in its TMR is detected to have failed.
6.3 An Example Scenario
Suppose N3 is detected to have failed. After subsequent status exchange, it is found that N2 and
N5 are dependent on N3. The latest checkpoints of N2 and N5 are with (N1 and N2) and (N4 and
N5) respectively. Hence N2 and N5 have their checkpoints. N3 was the main in TMR2 with N4
(primary) and N5 (secondary). Henceforth, N4 becomes main and N5 becomes primary in TMR2.
Another important issue that needs to be considered in this changed scenario is that, henceforth
N4 will take up the role of N3. Hence the MHs is the network may be made aware that the process
running on N3 would now execute on N4. This is an additional task for N4. However, generally,
this should not pose any hindrance to the working scenario in the network.
7. CONCLUSION
The checkpointing algorithm proposed in this paper constructs consistent checkpoints in a
distributed manner. Hence, forced checkpoints as well as useless checkpoints are never taken.
15. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
43
The checkpointing protocol described in the present work also eliminates the occurrences of both
missing and orphan messages. Thus, each and every checkpoint taken by a process contributes to
a consistent global snapshot and hence only the last global snapshot is required to be retained.
The overhead of the present checkpointing protocol is the (n2
) number of messages required
during checkpointing (where n is the total number of processes). Though other algorithms have
(n) number of messages for the same but drawbacks like checkpoint commit time, failure of
checkpoint coordinator, handling multiple checkpoint initiations are associated with them.
Recovery of self is decided by each of the processes after collecting system-wide information.
The dependence relation among the processes can be tracked on-line. A minimum number of
processes is required to recover depending on their relation with the failed process.
Moreover this fault tolerance technique of checkpointing and recovery is based on TMR concept
and that too in a wireless adhoc network. This paper proposes the approach towards obtaining
fault tolerance using checkpointing and recovery on wireless adhoc network based TMR. The
technique adopted is able to tolerate both the transient and permanent faults. The number of faults
that can be tolerated is maximum two in each group of the TMR MHs in the wireless adhoc
network.
This work does not consider node mobility in the adhoc network. However, the proposal can be
extended to mobile ad hoc network.
References:
1. K. M. Chandy, & L. Lamport, (1985) Distributed Snapshots : Determining Global States of
Distributed Systems, ACM Trans. On Computer Systems, Vol. 3, No.1, pp. 63-75.
2. G. Cao & M. Singhal, (1998) On Coordinated Checkpointing in Distributed Systems, IEEE Trans. on
Parallel & Distributed Systems, Vol. 9, No. 12, pp. 1213-1225.
3. M. Elnozahy, L. Alvisi, Y. Wang & D. B. Johnson, (1999) A Survey of Rollback-Recovery Protocols
in Message-Passing Systems, Report - CMU-CS-99-148.
4. I. C. Garcia & L. E. Buzato, (1999) Progressive Construction of Consistent Global Checkpoints,
ICDCS.
5. S. Kalaiselvi, & V. Rajaraman, (1997) Checkpointing Algorithm for Parallel Computers based on
Bounded Clock Drifts, Computer Science & Informatics, Vol. 27, No. 3, pp. 7-11.
6. R. Koo & S. Toueg, (1987) Checkpointing and Rollback Recovery for Distributed Systems,
IEEE Trans. on Software Engineering, Vol. SE-13, No.1, pp. 23-31.
7. D. Manivannan, R. H. B. Netzer & M. Singhal, (1997) Finding Consistent Global Checkpoints in
a Distributed Computation, IEEE Trans. On Parallel & Distributed Systems, Vol.8, No.6, pp. 623-
627.
8. D. Manivannan, Quasi-Synchronous Checkpointing:Models, Characterization, and Classification,
IEEE Trans. on Parallel and Distributed Systems, Vol.10, No.7, pp703-713.
9. Sarmistha Neogy, Anupam Sinha & P. K. Das, (2010), Checkpointing with Synchronized Clocks in
Distributed Systems, International Journal of UbiComp (IJU), Vol. 1, No.2, pp. 65 – 91
10. S. Neogy, A. Sinha & P. K. Das, (2001) Checkpoint processing in Distributed Systems Software
Using Synchronized Clocks, Proceedings of the IEEE Sponsored International Conference on
Information Technology: Coding and Computing: ITCC 2001, pp. 555-559.
11. S. Neogy, A. Sinha & P. K. Das, (2004) CCUML: A Checkpointing Protocol for Distributed System
Processes, Proceedings of IEEE TENCON 2004, pp. B553 – B556
12. R. H. B. Netzer & J. Xu, (1995) Necessary and Sufficient Conditions for consistent global snapshots,
IEEE Trans. On Parallel & Distributed Systems, 6(2), pp. 165-169.
16. International Journal of UbiComp (IJU), Vol.6, No.4, October 2015
44
13. N. Neves & K. W. Fuchs, Using Time to Improve the Performance of Coordinated Checkpointing,
http://composer.ecn.purdue.edu/~fuchs/fuchs/ipdsNN96.ps
14. N. NeveS & K. W. Fuchs, Coordinated Checkpointing without Direct Coordination,
http://composer.ecn.purdue.edu/~fuchs/fuchs
15. R. Prakash & M. Singhal, (1996) Low-Cost Checkpointing and Failure Recovery in Mobile
Computing Systems, IEEE Trans. On Parallel & Distributed Systems, Vol. 7, No. 10, pp.1035-1048.
16. P. Ramanathan & K. G. Shin, (1993) Use of Common Time Base for Checkpointing and Rollback
Recovery in a Distributed System, IEEE Trans. On Software Engg., Vol.19, No.6, pp. 571-583.
17. B. Randell, (1975) System Structure for Software Fault Tolerance, IEEE Trans. On Software Engg.,
Vol. SE-1, No.2, pp. 220-232.
18. A. SinhA, P. K. Das & D. Basu, (1998) Implementation and Timing Analysis of Clock
Synchronization on a Transputer based replicated system, Information & Software Technology, 40,
pp. 291-309.
19. T. K. Srikanth, & S. Toueg, (1987) Optimal Clock Synchronization, JACM, Vol. 34, No.3, pp. 626-
645.
20. R. E. Strom & S. Yemini, (1985) Optimistic Recovery in Distributed Systems, ACM Transactions on
Computer Systems, Vol.3, No.3, pp. 204-226.
21. Z. Tong, Y. K. Richard & W. T. Tsai, (1992) Rollback Recovery in Distributed Systems Using
Loosely Synchronized Clocks, IEEE Trans. On Parallel & Distributed Systems, Vol. 3, No.2, pp.
246-251.
22. J. Tsai & S. Kuo, (1998) Theoretical Analysis for Communication-Induced Checkpointing
Protocols with Rollback-Dependency Trackability, IEEE Trans. On Parallel & Distributed
Systems, Vol.9, No.10, pp. 963-971.
23. J. Tsai, Y. Wang & S. Kuo, (1999) Evaluations of domino-free communication-induced
checkpointing protocols, Information Processing Letters 69, pp. 31-37.
24. Y. M. Wang, A. Lowry & W. K. Fuchs, (1994) Consistent Global Checkpoints based on dependency
tracking, Information Processing Letters vol. 50, no. 4, pp. 223-230
25. R. E. Lyons, & W. Vanderkulk, (1962) The Use of Triple Modular Redundancy to Improve Computer
Reliability, IBM Journal, pp. 200-209
26. C. J. Hou & K. G. Shon, (1994) Incorporation of Optimal Time Outs Into Distributed Real-Time
Load Sharing, IEEE Trans. on Computers, Vol.43, No.5, pp. 528-547
27. K. S. Byun and J.H. Kim, (2001) Two-Tier Coordinated Checkpointg Algorithm For Cellular
Networks, ICCIS
28. S. Neogy, (2004) A Checkpointing Protocol for a Minimum set of Processes in Mobile Computing
Systems, Proceedings of the IASTED International Conference on Parallel and Distributed
Computing Systems (IASTED PDCS 2004), pp. 263-268
29. R. C. Gass, B. Gupta, An Efficient Checkpointing Scheme for Mobile Computing Systems, Computer
Science Department of Southern Illinois University
30. S. Neogy, (2007) WTMR – A new Fault Tolerance Technique for Wireless and Mobile Computing
Systems, Proceedings of the 11th
International Workshop on Future Trends of Distributed Computing
Systems (FTDCS 2007), pp. 130 – 137
31. C. Chowdhury, S. Neogy, (2007) Consistent Checkpointing, Recovery Protocol for Minimal number
of Nodes in Mobile Computing System, Lecture Notes in Computer Science, 2007, Volume
4873, High Performance Computing – HiPC 2007, pp. 599-611
32. Chandreyee Chowdhury, Sarmistha Neogy, (2009) Checkpointing using Mobile Agents for Mobile
Computing System, International Journal of Recent Trends in Engineering, ISSN 1797-9617, Vol. 1,
No.2, May 2009, Academy Publishers, pp. 26 – 29
33. S. Biswas, T. Nag, S. Neogy, (2014) Trust Based Energy Efficient Detection and Avoidance of Black
Hole Attack to Ensure Secure Routing in MANET, IEEE Xplore International Conference on
Applications and Innovations in Mobile Computing (AIMoC 2014), pp. 157 – 164