Fault tolerance is the dynamic method that’s used to keep the interconnected systems together, sustain reliability, and availability in distributed systems.
Distributed System Unit 1 Notes by Dr. Nilam Choudhary, SKIT JaipurDrNilam Choudhary
Distributed System is a collection of autonomous computer systems that are physically separated but are connected by a centralized computer network that is equipped with distributed system software. The autonomous computers will communicate among each system by sharing resources and files and performing the tasks assigned to them.
Distributed System Unit 1 Notes by Dr. Nilam Choudhary, SKIT JaipurDrNilam Choudhary
Distributed System is a collection of autonomous computer systems that are physically separated but are connected by a centralized computer network that is equipped with distributed system software. The autonomous computers will communicate among each system by sharing resources and files and performing the tasks assigned to them.
A Simplified Cost Efficient Distributed System architecture which relies on replication and recovery techniques using monitoring service, proxy service to handle service calls and a specialized server architecture which serves as both backup and standby service provider.
Introduction to distributed systems
Architecture for Distributed System, Goals of Distributed system, Hardware and Software
concepts, Distributed Computing Model, Advantages & Disadvantage distributed system, Issues
in designing Distributed System,
A Simplified Cost Efficient Distributed System architecture which relies on replication and recovery techniques using monitoring service, proxy service to handle service calls and a specialized server architecture which serves as both backup and standby service provider.
Introduction to distributed systems
Architecture for Distributed System, Goals of Distributed system, Hardware and Software
concepts, Distributed Computing Model, Advantages & Disadvantage distributed system, Issues
in designing Distributed System,
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
2. 2
◂ The need for a
reliable fault
tolerance mechanism
reduces these risks to
a minimum.
◂ A faulty system
creates a
human/economic
loss, air and rail traffic
control,
telecommunication
loss, etc
Faulty System
3. 3
◂ Fault tolerance is the dynamic method
that’s used to keep the interconnected
systems together, sustain reliability, and
availability in distributed systems.
◂ Efficient fault tolerance mechanism helps
in detecting of faults and if possible
recovers from it.
Fault Tolerance
4. Useful Requirements In The Fault
Tolerance System
◂ Availability: This is when a system is in a ready state, and is ready to
deliver its functions to its corresponding users.
◂ Reliability: This is the ability for a computer system run continuously
without a failure. A highly reliably system, works constantly in a long
period of time without interruption.
◂ Safety: This is when a system fails to carry out its corresponding
processes correctly and its operations are incorrect, but no shattering
event happens.
◂ Maintainability: A highly maintainability system show a great
measurement of accessibility, especially if the corresponding failures
can be noticed and fixed mechanically.
4
5. 5
◂ Errors caused by fault tolerance events are :
◂ Performance: this is when the hardware or software components cannot
meet the demands of the user.
◂ Omission: is when components cannot implement the actions of a
number of distinctive commands.
◂ Timing: this is when components cannot implement the actions of a
command at the right time.
◂ Crash: certain components crash with no response and cannot be
repaired.
◂ Fail-stop: is when the software identifies errors, it ends the process or
action, this is the easiest to handle, sometimes its simplicity deprives it
from handling real situations.
Errors
6. Forms of error
◂ Permanent error
◂ Temporary error
◂ Periodic error
6
7. Permanent error
◂ These causes damage to software
components and resulting to permanent
error or damage to the program,
preventing it from running or
functioning.
◂ In this case a restart of the program is
done, an example is when a program
crashes.
7
8. Temporary error
◂ This only result to a
brief damage to the
software component,
the damage gets
resolved after some
time and the
corresponding
software continues to
work or function
normally.
◂ These are errors that
occurs occasionally.
◂ In dealing with this type
of error, one of the
programs or software is
exited to resolve the
conflict.
8
Periodic error
9. Fault tolerance mechanism
can be divided into three
stages:
Hardware Fault
Software Fault
System Fault
9
Presented by:
Zoha Akhtar
10. Hardware fault tolerance
◂ This involves the delivery of supplementary backup
hardware such as; CPU, Memory, Hard disks, Power
Supply Units, etc.
◂ It deliver support for the hardware by providing the basic
hardware backup system, it can’t stop or detect error.
◂ There are two approach to hardware fault recovery
namely; Fault Masking and Dynamic Recovery
10
11. Fault Masking
◂ This is an important redundancy method that fully
covers faults within a set of redundant units or
components.
◂ Other identical units carry out or implement the
same tasks, and their outputs were noted to have
removed errors created by a defective module.
◂ Commonly used fault masking module it the Triple
Modular Redundancy (TMR).
11
12. Dynamic Recovery
◂ In dynamic recovery, special mechanism is
essential to discover faults in the units,
perform a switch on a faulty module, puts in
a spare, and carryout some software actions
necessary to restore and continue
computation such as; rollback, initialization,
retry, and restart.
12
13. Software Fault Tolerance
◂ This is a special software designed to
tolerate errors that would originate from
a software or programming errors.
◂ Software Fault Tolerance also consists of
checkpoints storage and rollback
recovery. Checkpoints are like a safe state
or snapshot of the entire system in a
working state.
13
14. System Fault Tolerance
◂ This is a complete system that stores not
just checkpoints, it detects error in
application, it stores memory block,
program checkpoint automatically.
◂ When a fault or an error occurs, the
system provides a correcting mechanism
thereby correcting the error.
14
16. Distributed System
◂ Distributed system are systems that don’t
share memory or clock, in distributed systems
nodes connect and relay information by
exchanging the information over a
communication medium.
◂ The different computer in distributed system
have their own memory and OS, local resources
are owned by the node using the resources.
18. How it works?
◂ In distributed system,
pool of rules are
executed to
synchronize the
actions of various or
different processes
on a communication
network, thereby
forming a distinct set
of related tasks
◂ The independent
system or computers
access resources
remotely or locally in
the distributed
system
communication
environment.
19. Cont..
◂ The user in the
distributed
environment is not
aware of the multiple
interconnected
system that ensures
the task is carried out
accurately.
◂ In distributed system,
no single system is
required or carries
the load of the entire
system in processing
a task
19
21. 21
◂ It is built on existing OS and network software.
◂ Distributed system encompasses the collection of self-
sufficient computers that are linked via a computer
network and distribution middleware.
◂ The distribution middleware in distributed system,
enables the corresponding computers to manage and
share the resources of the corresponding system, thus
making the computer users to see the system as a
single combined computing infrastructure.
Distributed System Architecture
22. ◂ Middleware is the link that joins distributed
applications across different geographical locations,
different computing hardware, network technologies,
operating systems, and programming languages.
◂ The middleware delivers standard services such as
naming, concurrency control, event distribution,
security, authorization etc.
22
Conti…
24. 24
It is a network where
each node is connected
together.
1. Full connected network
25. 25
◂ File Descriptors is an
intellectual indicator used to
access a file such as network
connection.
◂ Hence, the ability for the
networked systems to continue
functioning well is limited to
the connected nodes.
◂ When a new computer added, it
physically increase the number of
nodes connected to nodes.
◂ Because of the increase in nodes,
the number of file descriptors and
difficulty for each node to
communicate are increased heavily.
Disadvantage
26. 26
◂ The fully linked network systems are reliable.
◂ Because the message sent from one node to another
node goes through one link.
◂ And when a node fails to function or a link fails, other
nodes in the network can still communicate with
other nodes.
Conti…
27. “
27
◂ Some node have direct links while others
don’t.
◂ Some models of partially connected
networks are:
Tree structured network
Ring structured network
Multi-access bus network
Star structured networks
2. Partially connected network
28. 28
This is like a network with
hierarchy.
Each node in the network have a
fixed number nodes that is
attached to it in the sub level of
the tree.
In this network messages that are
transmitted from the parent to the
child nodes goes through one link.
Tree structured network
29. 29
Nodes are connected at least to
two other nodes in the network.
Creating a path for signals to be
exchanged between the
connected nodes.
As new nodes are added to the
network, the transmission delay
becomes longer.
If a node fail every other node in
the network can be inaccessible.
Ring structured network
30. 30
Nodes are connected to each other
through a communication link “a
bus”.
If the bus link connecting the nodes
fails to function, all other nodes can’t
connect to each other, and the
performance of the network drops.
As more nodes are added to the
system or heavy traffic occurs in the
system.
Multi-access bus network
31. 31
When the main node fails to
function the entire networked
system stops to function they
collapse.
Star structured networks
33. 33
◂ The replication based fault tolerance technique is one of
the most popular method.
◂ This technique actually replicate the data on different
other system.
◂ A request can be sent to one replica system in the midst
of the other replica system. In this way if a particular or
more than one node fails to function, it will not cause the
whole system to stop functioning.
◂ Replication adds redundancy in a system.
Replication Based Fault Tolerance
Technique
35. Phases In The
Replication Protocol
There are different phase in the
replication protocol which are
Client contact
Server coordination
Execution
Agreement
Coordination
Client response.
36. Issues in replication based techniques
◂ Degree or Number of
Replica:
The replication techniques
utilizes some protocols in
replication of data or an object,
such protocol are: Primary
backup replication, voting and
primary-per partition
replication.
◂ Consistency:
Several copies of the same
entity create problem of
consistency because of
update that can be done by
any of the user. The
consistency of data is ensured
by some criteria such as
linearizability, sequential
consistency etc.
36
37. 37
◂ This fault tolerance technique is often used for faults that
disappears without anything been done to remedy the
situation, this kind of fault is known as transient faults.
◂ Transient faults occurs when there’s a temporary malfunction
in any of the system component or sometimes by
environmental interference. The problem with transient faults
is that they are hard to handle and diagnose but they are less
severe in nature.
◂ In handling of transient fault, software based fault tolerance
technique such as Process-Level Redundancy (PLR) is used
because hardware based fault tolerance technique is more
expensive to deploy.
Process Level Redundancy Technique
38. ◂ Redundancy at the process level enables the OS to schedule
easily processes across all accessible hardware resources.
◂ The PLR provides improved performance over existing
software transient fault tolerance techniques with a
16.9% overhead for detection of fault .
◂ PLR uses a software-centric approach which causes a shift in
focus from guaranteeing hardware execution correctly to
ensuring a correct software execution.
38
41. Check Pointing and Roll Back:
◂ This is a popular technique which in the first part “check
point” stores the current state of the system and this is
done occasionally.
◂ The check point information is stored in a stable storage
device for easy roll back when there’s a node failure.
Information that is stored or checked includes
environment, process state, value of the registers etc.
◂ These information are very useful if a complete recovery
needs to be done.
41
43. Two most known type or roll back
recovery
◂ Checkpoint roll back
recovery technique.
◂ The checkpoint based
uses the checkpoints
states that it has stored
in a stable storage
device.
◂ Log based roll back
recovery technique.
◂ The log based rollback
recovery techniques
combines both check
pointing and logging
of events
43
44. Fusion based technique
◂ Fusion based technique stands as an alternative
because it requires fewer backup machines
compared to the replication based technique.
◂ The backup machines are fused corresponding
to the given set of machines.
◂ The fusion based technique has a very high
overhead during recovery process and it’s
acceptable in low probability of fault in a system.
44
47. Conclusion
◂ This research showed the different type of fault tolerance
technique in distributed system such as the Check
Pointing and Replication Based Fault Tolerance Technique.
◂ Each mechanism is advantageous over the other and
costly in deployment.
◂ Software fault tolerance system comprises of checkpoints
storage and rollback recovery mechanisms, and the
system fault tolerance is a complete system that does both
software and hardware fault tolerance, to ensure
availability of the system during failure, error or fault.
47