SlideShare a Scribd company logo
1 of 47
Introduction to Distributed
Systems
Introduction to DS
• A distributed system is a collection of autonomous computers
connected through a network, working together as a single cohesive
system.
• Characteristics: Distribution of resources, concurrency, and failure
independence.
• Examples: Internet, cloud computing, peer-to-peer networks.
Why Distributed Systems?
• Advantages: Increased performance, scalability, reliability, fault
tolerance, and resource sharing.
• Challenges: Coordination, communication, consistency, and security.
Design Goals of Distributed Systems
• Scalability: The system should be able to handle an increasing
number of users and resources.
• Reliability: The system should continue to function despite individual
component failures.
• Performance: The system should provide efficient and timely
responses to user requests.
• Transparency: The system should appear as a single, unified entity to
its users.
• Flexibility: The system should be adaptable to changing requirements
and environments.
Types of Distributed Systems
• Cluster Computing Systems:
• Cluster computing systems combine multiple machines or servers to form a
cluster that works together to perform large-scale computational tasks.
They distribute the workload among cluster nodes and often leverage
parallel processing techniques. Examples include Apache Hadoop and
Apache Spark.
• Grid Computing Systems:
• Grid computing systems connect geographically distributed resources to
form a virtual supercomputer. They enable the sharing of computing
power, storage, and data across different organizations or institutions. Grid
systems are typically used for scientific computing, research collaborations,
and resource-intensive applications.
Cont’d…
• Cloud Computing Systems:
• Cloud computing systems provide on-demand access to a pool of
computing resources, including virtual machines, storage, and services,
over the internet. They offer scalability, flexibility, and pay-per-use billing
models. Examples include Amazon Web Services (AWS), Microsoft Azure,
and Google Cloud Platform (GCP).
• Internet of Things (IoT) Systems:
• IoT systems connect a large number of devices and sensors, often
geographically distributed, to collect and exchange data. They involve
distributed processing, data aggregation, and coordination among devices
and services. IoT systems are used in various domains such as smart
homes, industrial automation, and smart cities.
Cont’d…
• Distributed File Systems:
• Distributed file systems are designed to provide a unified view of file
storage across multiple machines. They distribute file data and metadata
across nodes, allowing clients to access and manipulate files as if they were
stored on a single machine. Examples include the Google File System (GFS)
and the Hadoop Distributed File System (HDFS).
• Distributed Database Systems:
• Distributed database systems store and manage data across multiple
nodes to provide scalability, fault tolerance, and improved performance.
They distribute data across nodes and support distributed query processing
and transaction management. Examples include Apache Cassandra and
Apache HBase.
Architectural Models
• Client-Server Model
• Clients request services from servers.
• Servers provide services to clients.
• Example: Web applications with web browsers (clients) and web servers.
• Peer-to-Peer Model
• Peers communicate and collaborate directly with each other.
• Peers can act as clients or servers.
• Example: File sharing networks like BitTorrent.
• Hybrid Models
• Combine elements of both client-server and peer-to-peer models.
• Example: Distributed databases with dedicated server nodes and peer replication.
Next…
Communication Models
Communication in Distributed Systems
• Communication Models
• Message Passing: Communication through explicit message exchanges.
• Remote Procedure Call (RPC): Invoking procedures on remote machines.
• Publish-Subscribe: Subscribers receive notifications about events from publishers.
• Message Queues: Messages are stored in queues for asynchronous processing.
• Communication Protocols
• TCP/IP: Transmission Control Protocol/Internet Protocol for reliable and
connection-oriented communication.
• UDP: User Datagram Protocol for unreliable and connectionless communication.
• HTTP: Hypertext Transfer Protocol for web-based communication.
• MQTT: Message Queuing Telemetry Transport for lightweight publish-subscribe
messaging.
Next…
Chapter Five
Consistency Models
Consistency Models
• Consistency models define the guarantees about the order and visibility of
data in a distributed system.
• A replica refers to a copy of data or a component in a distributed system
that is stored and maintained on multiple nodes.
• Some commonly discussed consistency models include:
• Strong Consistency: In a strongly consistent system, all replicas show the
same data at all times.
• Any read operation immediately reflects the most recent write operation.
• Achieving strong consistency often requires coordination and
synchronization between replicas, which can impact performance and
availability.
Types of Strong Consistency
• Two-Phase Locking (2PL): This mechanism ensures that conflicting
operations on shared data are serialized.
• It involves acquiring locks before accessing shared data and releasing
them after the operation is complete.
• 2PL guarantees strict serializability but can introduce contention and
affect system performance.
• Distributed Transaction Commit Protocol: Consistency can be
enforced using distributed transaction commit protocols, such as the
Two-Phase Commit (2PC) or Three-Phase Commit (3PC).
Cont’d…
• Eventual Consistency: Eventual consistency allows replicas to
temporarily show different data but guarantees that they will
eventually converge to a consistent state.
• This model relaxes the synchronization requirements, allowing
replicas to operate independently and asynchronously.
• It is often used in systems that prioritize availability and partition
tolerance over strict consistency.
Cont’d…
• Vector Clocks: Vector clocks are used to track the causal ordering of events
in a distributed system. Each replica maintains a vector clock that is
updated with each event. The vector clock information helps determine the
relative ordering of events across replicas.
• Anti-Entropy and Merkle Trees: Anti-entropy mechanisms, such as the
Gossip Protocol, periodically exchange updates between replicas to
synchronize data. Merkle trees are used to efficiently detect differences
and reconcile inconsistencies between replicas by verifying the integrity of
data blocks.
• Conflict Resolution and Convergence: In eventual consistency, conflicts
may arise when concurrent updates occur on different replicas. Conflict
resolution techniques, such as Last-Writer-Wins (LWW) or Multi-Value
Convergence (MVC), are used to reconcile conflicting updates and converge
the data to a consistent state over time.
Cont’d…
• Causal Consistency: Causal consistency ensures that the order of
causally related events is preserved across replicas.
• If one event causally depends on another, all replicas must observe
the same causal order. However, the ordering of unrelated events can
be different across replicas.
• Read/Write Consistency: Some systems provide different consistency
levels for read and write operations.
• For example, a system may offer strong consistency for write
operations to ensure data integrity but provide eventual consistency
for read operations to improve performance.
Cont’d…
• Dependency Tracking: Causal consistency mechanisms track the
causal dependencies between events. This can be done through
explicit metadata or implicit tracking based on the ordering of events.
Lamport Clocks: Lamport clocks assign a unique timestamp to each
event and help establish a partial ordering of events in a distributed
system. Lamport clocks are used to capture the causal dependencies
between events and ensure consistent ordering.
• Vector Clocks: Vector clocks, as mentioned earlier, are also used in
causal consistency mechanisms to track and enforce the causal
ordering of events across replicas.
Next…
Replication
Replication Techniques
• Replication involves creating and maintaining multiple copies of data
or components across distributed systems.
• Replication offers several benefits, including increased availability,
fault tolerance, and performance.
• Here are some common replication techniques:
Cont’d…
• Primary-Backup Replication: In this approach, one replica (the primary)
handles all client requests and updates the backup replicas.
• If the primary replica fails, one of the backups takes over its role.
• This technique ensures that there is always a consistent copy of the data
available.
• State Machine Replication: State machine replication involves executing
the same set of commands on all replicas in the same order.
• Each replica applies the commands to its local state machine, ensuring that
they all reach the same state.
• This technique provides strong consistency but can be resource-intensive
due to the need for synchronous communication and coordination.
Cont’d…
• Quorum-Based Replication: Quorum-based replication requires a
certain number of replicas to agree on a write operation before it is
considered successful.
• The quorum can be a majority, a fixed number, or a percentage of the
replicas.
• Quorum-based replication balances the trade-off between
consistency and performance, allowing systems to continue operating
as long as a sufficient number of replicas are available.
Next…
Synchronization in distributed systems
Synchronization in distributed systems
• Synchronization in distributed systems refers to the coordination and
ordering of actions or events across multiple processes or
components in the system.
• It involves ensuring that processes execute their tasks in a mutually
agreed-upon and consistent manner.
• Synchronization is important in distributed systems for several
reasons:
Cont’d…
• Consistency: Synchronization helps maintain data consistency and coherence
across replicas or shared resources. It ensures that concurrent operations do not
lead to conflicts or inconsistencies in the system.
• Correctness: Synchronization is essential for correctness in distributed systems. It
allows processes to coordinate their actions, enforce dependencies, and ensure
that critical operations are executed in the correct order.
• Mutual Exclusion: Synchronization mechanisms, such as locks or semaphores,
enable processes to access shared resources in a mutually exclusive manner. This
prevents race conditions and ensures that only one process can access a resource
at a time.
• Coordination: Synchronization facilitates coordination and cooperation between
processes. It enables processes to communicate, exchange data, and synchronize
their activities to achieve a common goal.
1. Locking Mechanisms:
• Distributed Locks: Processes can acquire and release distributed locks
to ensure exclusive access to shared resources. Distributed lock
algorithms, such as the Ricart-Agrawala algorithm or the Chandy-
Lamport algorithm, enable processes to coordinate their access to
critical sections.
• Two-Phase Locking (2PL): 2PL is a concurrency control mechanism
that ensures serializability by acquiring locks on resources before
accessing them and releasing them after the operation is complete.
2. Message Passing:
• Message Ordering: Messages exchanged between processes can be
ordered to ensure causality or sequential consistency.
• Techniques like Lamport timestamps or vector clocks are used to
establish the order of events in a distributed system.
• Barrier Synchronization: Processes can synchronize their activities
using barriers, where each process waits until all participating
processes have reached the barrier before continuing execution.
• Barriers are commonly used to synchronize parallel or distributed
computations.
Consensus and Coordination Protocols:
• Distributed Consensus: Distributed consensus protocols, such as Paxos or
Raft, enable processes to agree on a common value or make coordinated
decisions even in the presence of failures or network partitions.
• Distributed Coordination: Mechanisms like distributed semaphores or
condition variables allow processes to coordinate their activities based on
certain conditions or events. Processes can signal or wait for specific
conditions to be met before proceeding.
• Clock Synchronization:
• Clock Synchronization: Distributed systems often require synchronized
clocks to order events accurately. Clock synchronization protocols like the
Network Time Protocol (NTP) or the Precision Time Protocol (PTP) are used
to ensure that clocks across different nodes are closely aligned.
Next…
Fault Tolerance and Recovery
Fault Tolerance and Recovery
• Fault Types
• Fail-Stop: A process halts and cannot recover.
• Crash-Recovery: A process halts but can recover and resume.
• Byzantine: A process exhibits arbitrary and malicious behavior.
• Fault Tolerance Techniques
• Redundancy: Replicating components to provide backup.
• Error Detection and Correction: Using checksums and error-correcting codes.
• Checkpointing: Periodically saving the system state to enable recovery.
• Recovery Protocols
• Restart-Based Recovery: Restarting failed components from a clean state.
• Rollback-Recovery: Returning the system to a previously consistent state.
• Checkpoint-Based Recovery: Using saved checkpoints to restore the system
state.
Cont’d…
• Checkpointing: Checkpointing is a technique used to periodically save
the system's state.
• By creating checkpoints at certain intervals, the system can recover
from failures by restoring the state to a previously consistent point.
• Checkpointing involves saving critical data and metadata to a stable
storage medium to enable recovery and resumption of operations.
Process resilience
• Process resilience in distributed systems refers to the ability of
individual processes or components within the system to withstand
failures and continue functioning properly.
• It involves designing and implementing mechanisms that enable
processes to recover from failures, adapt to changing conditions, and
maintain system availability.
• Here are some key aspects of process resilience in distributed
systems:
Fault Detection and Failure Handling:
• Monitoring: Processes should be continuously monitored to detect
failures or abnormal behavior. This can be done through heartbeat
mechanisms, timeouts, or periodic health checks.
• Failure Handling: When a failure is detected, the system should have
mechanisms in place to handle it. This may involve restarting the
failed process, migrating it to a different node, or distributing its
workload among other processes.
• Redundancy and Replication: Replicating critical processes across
multiple nodes provides fault tolerance. If one replica fails, another
replica can take over its responsibilities. This ensures that the system
can continue functioning even in the presence of failures.
Error Recovery and Resynchronization:
• Error Handling: Processes should be designed to handle errors gracefully.
They should have appropriate error handling mechanisms, such as
exception handling and retries, to recover from transient failures.
• Checkpointing: Checkpointing involves periodically saving the state of a
process to stable storage. In case of failure, the process can recover by
restoring its state from the last checkpoint. This ensures that the process
can resume its operation without losing significant progress.
• State Resynchronization: If a failed process needs to be restarted or
migrated, its state may need to be resynchronized with other processes.
This can be achieved through techniques like message replay or state
transfer from other replicas.
Load Balancing and Elasticity:
• Load Distribution: Processes should be dynamically balanced across
nodes to ensure that the workload is evenly distributed. Load
balancing mechanisms, such as request routing algorithms or
dynamic resource allocation, can help achieve this.
• Elastic Scaling: Process resilience involves the ability to scale the
system based on varying workload demands.
• Processes should be able to scale up or down dynamically to handle
increased or decreased load, respectively.
Next…
Security Mechanisms in Distributed Systems
Security Challenges in Distributed Systems
• Distributed systems face various security challenges due to their
distributed nature and the potential for attacks on different
components.
• Some common security challenges include:
• Authentication and Authorization: Ensuring the identity of
participants in the system and granting appropriate access rights to
resources.
• Data Confidentiality: Protecting sensitive data from unauthorized
access or disclosure during transmission and storage.
Cont’d…
• Data Integrity: Ensuring that data remains unchanged and
uncorrupted during transmission and storage.
• Availability: Preventing denial-of-service attacks and ensuring that
the system remains accessible to legitimate users.
• Secure Communication: Establishing secure channels for
communication between distributed components to prevent
eavesdropping or tampering.
Security Mechanisms in Distributed Systems
• To address the security challenges, various security mechanisms are
employed in distributed systems. These mechanisms include:
• Encryption: Encrypting data using cryptographic algorithms to protect its
confidentiality and integrity.
• Encryption ensures that only authorized recipients can decrypt and access
the data.
• Digital Signatures: Using digital signatures to verify the authenticity and
integrity of messages.
• Digital signatures provide a way to verify that a message has been sent by
the claimed sender and that it hasn't been modified during transmission.
Cont’d…
• A digital signature is a cryptographic mechanism used to verify the
authenticity, integrity, and non-repudiation of digital documents or
messages.
• It provides a way to ensure that a message or document comes from
a particular sender and has not been altered during transmission.
Here's an overview of how digital signatures are implemented:
Cont’d…
• Key Pair Generation:
• Public Key: The sender generates a key pair consisting of a public key and a
private key. The public key is shared with others and is used for verifying
the digital signatures created by the private key.
• Signature Generation:
• Hashing: The sender calculates a cryptographic hash function (such as SHA-
256) of the document or message to be signed. The hash function produces
a fixed-length output that uniquely represents the document.
• Signing: The sender then encrypts the hash value using their private key.
This encrypted hash, known as the digital signature, is specific to the
document and the sender's private key.
Cont’d…
• Signature Verification:
• Signature Extraction: The recipient of the document retrieves the digital
signature attached to the document.
• Public Key Usage: The recipient uses the sender's public key to decrypt the
digital signature, obtaining the original hash value.
• Hash Calculation: The recipient independently calculates the hash of the
received document using the same hash function.
• Comparison: The recipient compares the calculated hash with the
decrypted hash obtained from the digital signature. If the two values
match, it means the document has not been tampered with during
transmission and that the sender is the legitimate signer.
Cont’d…
• Access Control: Implementing access control mechanisms to enforce
authorization policies and restrict access to resources based on user
roles and permissions.
• Firewalls and Intrusion Detection Systems (IDS): Deploying firewalls
and IDS to monitor network traffic, detect and prevent unauthorized
access attempts, and identify potential security breaches.
• Secure Communication Protocols: Using secure communication
protocols such as SSL/TLS (Secure Sockets Layer/Transport Layer
Security) to establish secure connections and protect data during
transmission.
Cont’d…
• Security Auditing and Logging: Implementing auditing and logging
mechanisms to record and monitor system activities, detect security
incidents, and facilitate forensic analysis in case of security breaches.
• Logging mechanisms
• Log Generation: Distributed systems generate logs that capture relevant
events and activities. Logs can include information such as user actions,
system events, network activities, authentication attempts, and error
conditions.
• Log Format: Logs are typically stored in a standardized format, such as
plain text or structured formats like JSON or XML. The log format should
include essential details such as timestamps, event descriptions, source IP
addresses, and other relevant metadata.
Cont’d…
• Log Collection: Logs from distributed components are collected and
consolidated in a centralized location or a distributed log
management system.
• This centralization enables easier analysis, correlation, and search
capabilities across multiple log sources.
Auditing Mechanism:
• Security Policies: Establishing security policies and standards is
crucial for auditing.
• These policies define what activities are considered normal and
acceptable within the system, and what actions should be flagged as
potential security incidents.
• Event Monitoring: Auditing involves monitoring and analyzing logged
events to identify potential security issues or violations of security
policies.
• This may involve using automated tools, intrusion detection systems,
or manual reviews of log entries.
Cont’d…
• Alerting and Notifications: Auditing systems can be configured to
generate alerts or notifications when specific security-related events
or patterns are detected.
• These alerts can be sent to security administrators or a Security
Operations Center (SOC) for immediate response and investigation.
Thankyou for your Attention.

More Related Content

Similar to Chapter Introductionn to distributed system .pptx

An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptHarshalUbale2
 
20IT703_PDS_PPT_Unit_I.ppt
20IT703_PDS_PPT_Unit_I.ppt20IT703_PDS_PPT_Unit_I.ppt
20IT703_PDS_PPT_Unit_I.pptsuganthi66742
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing pptDC Graphics
 
Chapeter 2 introduction to cloud computing
Chapeter 2   introduction to cloud computingChapeter 2   introduction to cloud computing
Chapeter 2 introduction to cloud computingeShikshak
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system Sarvesh Meena
 
Computing notes
Computing notesComputing notes
Computing notesthenraju24
 
distributed computing: Unleashing collaborative computing power.ppt
distributed computing: Unleashing collaborative computing power.pptdistributed computing: Unleashing collaborative computing power.ppt
distributed computing: Unleashing collaborative computing power.pptrutvik64
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4alixafar
 
Distributed system
Distributed systemDistributed system
Distributed systemchirag patil
 
- Introduction - Distributed - System -
- Introduction - Distributed - System  -- Introduction - Distributed - System  -
- Introduction - Distributed - System -ssuser7c150a
 
01Introduction to Cloud Computing .pptx
01Introduction to Cloud Computing  .pptx01Introduction to Cloud Computing  .pptx
01Introduction to Cloud Computing .pptxssuser586772
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systemsvampugani
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
chapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.pptchapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.pptAschalewAyele2
 
Distributed Operating System.pptx
Distributed Operating System.pptxDistributed Operating System.pptx
Distributed Operating System.pptxharpreetkaur1129
 

Similar to Chapter Introductionn to distributed system .pptx (20)

An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.ppt
 
20IT703_PDS_PPT_Unit_I.ppt
20IT703_PDS_PPT_Unit_I.ppt20IT703_PDS_PPT_Unit_I.ppt
20IT703_PDS_PPT_Unit_I.ppt
 
Distributed and clustered systems
Distributed and clustered systemsDistributed and clustered systems
Distributed and clustered systems
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing ppt
 
CCUnit1.pdf
CCUnit1.pdfCCUnit1.pdf
CCUnit1.pdf
 
Chapeter 2 introduction to cloud computing
Chapeter 2   introduction to cloud computingChapeter 2   introduction to cloud computing
Chapeter 2 introduction to cloud computing
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system
 
Computing notes
Computing notesComputing notes
Computing notes
 
Real-Time Design Patterns
Real-Time Design PatternsReal-Time Design Patterns
Real-Time Design Patterns
 
distributed computing: Unleashing collaborative computing power.ppt
distributed computing: Unleashing collaborative computing power.pptdistributed computing: Unleashing collaborative computing power.ppt
distributed computing: Unleashing collaborative computing power.ppt
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4
 
Distributed system
Distributed systemDistributed system
Distributed system
 
- Introduction - Distributed - System -
- Introduction - Distributed - System  -- Introduction - Distributed - System  -
- Introduction - Distributed - System -
 
01Introduction to Cloud Computing .pptx
01Introduction to Cloud Computing  .pptx01Introduction to Cloud Computing  .pptx
01Introduction to Cloud Computing .pptx
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systems
 
1.intro. to distributed system
1.intro. to distributed system1.intro. to distributed system
1.intro. to distributed system
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
chapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.pptchapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.ppt
 
Distributed Operating System.pptx
Distributed Operating System.pptxDistributed Operating System.pptx
Distributed Operating System.pptx
 

More from Tekle12

Chapter 3 Naming in distributed system.pptx
Chapter 3 Naming in distributed system.pptxChapter 3 Naming in distributed system.pptx
Chapter 3 Naming in distributed system.pptxTekle12
 
Chapter 6emerging technology - EMTE.pptx
Chapter 6emerging technology - EMTE.pptxChapter 6emerging technology - EMTE.pptx
Chapter 6emerging technology - EMTE.pptxTekle12
 
Chapter 4about internet of things IoT.pptx
Chapter 4about internet of things IoT.pptxChapter 4about internet of things IoT.pptx
Chapter 4about internet of things IoT.pptxTekle12
 
Design and analysis of algorithm chapter two.pptx
Design and analysis of algorithm chapter two.pptxDesign and analysis of algorithm chapter two.pptx
Design and analysis of algorithm chapter two.pptxTekle12
 
Chapter1.1 Introduction to design and analysis of algorithm.ppt
Chapter1.1 Introduction to design and analysis of algorithm.pptChapter1.1 Introduction to design and analysis of algorithm.ppt
Chapter1.1 Introduction to design and analysis of algorithm.pptTekle12
 
Chapter 6 WSN.ppt
Chapter 6 WSN.pptChapter 6 WSN.ppt
Chapter 6 WSN.pptTekle12
 
Chapter 2.1.pptx
Chapter 2.1.pptxChapter 2.1.pptx
Chapter 2.1.pptxTekle12
 
CHAPTER-3a.ppt
CHAPTER-3a.pptCHAPTER-3a.ppt
CHAPTER-3a.pptTekle12
 
CHAPTER-5.ppt
CHAPTER-5.pptCHAPTER-5.ppt
CHAPTER-5.pptTekle12
 
CHAPTER-2.ppt
CHAPTER-2.pptCHAPTER-2.ppt
CHAPTER-2.pptTekle12
 
CHAPTER-1.ppt
CHAPTER-1.pptCHAPTER-1.ppt
CHAPTER-1.pptTekle12
 
Chapter 1 - Intro to Emerging Technologies.pptx
Chapter 1 - Intro to Emerging Technologies.pptxChapter 1 - Intro to Emerging Technologies.pptx
Chapter 1 - Intro to Emerging Technologies.pptxTekle12
 
chapter 3.2 TCP.pptx
chapter 3.2 TCP.pptxchapter 3.2 TCP.pptx
chapter 3.2 TCP.pptxTekle12
 
Chapter 2.1.pptx
Chapter 2.1.pptxChapter 2.1.pptx
Chapter 2.1.pptxTekle12
 
Chapter 1 - EMTE.pptx
Chapter 1 - EMTE.pptxChapter 1 - EMTE.pptx
Chapter 1 - EMTE.pptxTekle12
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptxTekle12
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptxTekle12
 
chapter 6.1.pptx
chapter 6.1.pptxchapter 6.1.pptx
chapter 6.1.pptxTekle12
 
Chapter 4.pptx
Chapter 4.pptxChapter 4.pptx
Chapter 4.pptxTekle12
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxTekle12
 

More from Tekle12 (20)

Chapter 3 Naming in distributed system.pptx
Chapter 3 Naming in distributed system.pptxChapter 3 Naming in distributed system.pptx
Chapter 3 Naming in distributed system.pptx
 
Chapter 6emerging technology - EMTE.pptx
Chapter 6emerging technology - EMTE.pptxChapter 6emerging technology - EMTE.pptx
Chapter 6emerging technology - EMTE.pptx
 
Chapter 4about internet of things IoT.pptx
Chapter 4about internet of things IoT.pptxChapter 4about internet of things IoT.pptx
Chapter 4about internet of things IoT.pptx
 
Design and analysis of algorithm chapter two.pptx
Design and analysis of algorithm chapter two.pptxDesign and analysis of algorithm chapter two.pptx
Design and analysis of algorithm chapter two.pptx
 
Chapter1.1 Introduction to design and analysis of algorithm.ppt
Chapter1.1 Introduction to design and analysis of algorithm.pptChapter1.1 Introduction to design and analysis of algorithm.ppt
Chapter1.1 Introduction to design and analysis of algorithm.ppt
 
Chapter 6 WSN.ppt
Chapter 6 WSN.pptChapter 6 WSN.ppt
Chapter 6 WSN.ppt
 
Chapter 2.1.pptx
Chapter 2.1.pptxChapter 2.1.pptx
Chapter 2.1.pptx
 
CHAPTER-3a.ppt
CHAPTER-3a.pptCHAPTER-3a.ppt
CHAPTER-3a.ppt
 
CHAPTER-5.ppt
CHAPTER-5.pptCHAPTER-5.ppt
CHAPTER-5.ppt
 
CHAPTER-2.ppt
CHAPTER-2.pptCHAPTER-2.ppt
CHAPTER-2.ppt
 
CHAPTER-1.ppt
CHAPTER-1.pptCHAPTER-1.ppt
CHAPTER-1.ppt
 
Chapter 1 - Intro to Emerging Technologies.pptx
Chapter 1 - Intro to Emerging Technologies.pptxChapter 1 - Intro to Emerging Technologies.pptx
Chapter 1 - Intro to Emerging Technologies.pptx
 
chapter 3.2 TCP.pptx
chapter 3.2 TCP.pptxchapter 3.2 TCP.pptx
chapter 3.2 TCP.pptx
 
Chapter 2.1.pptx
Chapter 2.1.pptxChapter 2.1.pptx
Chapter 2.1.pptx
 
Chapter 1 - EMTE.pptx
Chapter 1 - EMTE.pptxChapter 1 - EMTE.pptx
Chapter 1 - EMTE.pptx
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
chapter 6.1.pptx
chapter 6.1.pptxchapter 6.1.pptx
chapter 6.1.pptx
 
Chapter 4.pptx
Chapter 4.pptxChapter 4.pptx
Chapter 4.pptx
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Chapter Introductionn to distributed system .pptx

  • 2. Introduction to DS • A distributed system is a collection of autonomous computers connected through a network, working together as a single cohesive system. • Characteristics: Distribution of resources, concurrency, and failure independence. • Examples: Internet, cloud computing, peer-to-peer networks.
  • 3. Why Distributed Systems? • Advantages: Increased performance, scalability, reliability, fault tolerance, and resource sharing. • Challenges: Coordination, communication, consistency, and security.
  • 4. Design Goals of Distributed Systems • Scalability: The system should be able to handle an increasing number of users and resources. • Reliability: The system should continue to function despite individual component failures. • Performance: The system should provide efficient and timely responses to user requests. • Transparency: The system should appear as a single, unified entity to its users. • Flexibility: The system should be adaptable to changing requirements and environments.
  • 5. Types of Distributed Systems • Cluster Computing Systems: • Cluster computing systems combine multiple machines or servers to form a cluster that works together to perform large-scale computational tasks. They distribute the workload among cluster nodes and often leverage parallel processing techniques. Examples include Apache Hadoop and Apache Spark. • Grid Computing Systems: • Grid computing systems connect geographically distributed resources to form a virtual supercomputer. They enable the sharing of computing power, storage, and data across different organizations or institutions. Grid systems are typically used for scientific computing, research collaborations, and resource-intensive applications.
  • 6. Cont’d… • Cloud Computing Systems: • Cloud computing systems provide on-demand access to a pool of computing resources, including virtual machines, storage, and services, over the internet. They offer scalability, flexibility, and pay-per-use billing models. Examples include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). • Internet of Things (IoT) Systems: • IoT systems connect a large number of devices and sensors, often geographically distributed, to collect and exchange data. They involve distributed processing, data aggregation, and coordination among devices and services. IoT systems are used in various domains such as smart homes, industrial automation, and smart cities.
  • 7. Cont’d… • Distributed File Systems: • Distributed file systems are designed to provide a unified view of file storage across multiple machines. They distribute file data and metadata across nodes, allowing clients to access and manipulate files as if they were stored on a single machine. Examples include the Google File System (GFS) and the Hadoop Distributed File System (HDFS). • Distributed Database Systems: • Distributed database systems store and manage data across multiple nodes to provide scalability, fault tolerance, and improved performance. They distribute data across nodes and support distributed query processing and transaction management. Examples include Apache Cassandra and Apache HBase.
  • 8. Architectural Models • Client-Server Model • Clients request services from servers. • Servers provide services to clients. • Example: Web applications with web browsers (clients) and web servers. • Peer-to-Peer Model • Peers communicate and collaborate directly with each other. • Peers can act as clients or servers. • Example: File sharing networks like BitTorrent. • Hybrid Models • Combine elements of both client-server and peer-to-peer models. • Example: Distributed databases with dedicated server nodes and peer replication.
  • 10. Communication in Distributed Systems • Communication Models • Message Passing: Communication through explicit message exchanges. • Remote Procedure Call (RPC): Invoking procedures on remote machines. • Publish-Subscribe: Subscribers receive notifications about events from publishers. • Message Queues: Messages are stored in queues for asynchronous processing. • Communication Protocols • TCP/IP: Transmission Control Protocol/Internet Protocol for reliable and connection-oriented communication. • UDP: User Datagram Protocol for unreliable and connectionless communication. • HTTP: Hypertext Transfer Protocol for web-based communication. • MQTT: Message Queuing Telemetry Transport for lightweight publish-subscribe messaging.
  • 12. Consistency Models • Consistency models define the guarantees about the order and visibility of data in a distributed system. • A replica refers to a copy of data or a component in a distributed system that is stored and maintained on multiple nodes. • Some commonly discussed consistency models include: • Strong Consistency: In a strongly consistent system, all replicas show the same data at all times. • Any read operation immediately reflects the most recent write operation. • Achieving strong consistency often requires coordination and synchronization between replicas, which can impact performance and availability.
  • 13. Types of Strong Consistency • Two-Phase Locking (2PL): This mechanism ensures that conflicting operations on shared data are serialized. • It involves acquiring locks before accessing shared data and releasing them after the operation is complete. • 2PL guarantees strict serializability but can introduce contention and affect system performance. • Distributed Transaction Commit Protocol: Consistency can be enforced using distributed transaction commit protocols, such as the Two-Phase Commit (2PC) or Three-Phase Commit (3PC).
  • 14. Cont’d… • Eventual Consistency: Eventual consistency allows replicas to temporarily show different data but guarantees that they will eventually converge to a consistent state. • This model relaxes the synchronization requirements, allowing replicas to operate independently and asynchronously. • It is often used in systems that prioritize availability and partition tolerance over strict consistency.
  • 15. Cont’d… • Vector Clocks: Vector clocks are used to track the causal ordering of events in a distributed system. Each replica maintains a vector clock that is updated with each event. The vector clock information helps determine the relative ordering of events across replicas. • Anti-Entropy and Merkle Trees: Anti-entropy mechanisms, such as the Gossip Protocol, periodically exchange updates between replicas to synchronize data. Merkle trees are used to efficiently detect differences and reconcile inconsistencies between replicas by verifying the integrity of data blocks. • Conflict Resolution and Convergence: In eventual consistency, conflicts may arise when concurrent updates occur on different replicas. Conflict resolution techniques, such as Last-Writer-Wins (LWW) or Multi-Value Convergence (MVC), are used to reconcile conflicting updates and converge the data to a consistent state over time.
  • 16. Cont’d… • Causal Consistency: Causal consistency ensures that the order of causally related events is preserved across replicas. • If one event causally depends on another, all replicas must observe the same causal order. However, the ordering of unrelated events can be different across replicas. • Read/Write Consistency: Some systems provide different consistency levels for read and write operations. • For example, a system may offer strong consistency for write operations to ensure data integrity but provide eventual consistency for read operations to improve performance.
  • 17. Cont’d… • Dependency Tracking: Causal consistency mechanisms track the causal dependencies between events. This can be done through explicit metadata or implicit tracking based on the ordering of events. Lamport Clocks: Lamport clocks assign a unique timestamp to each event and help establish a partial ordering of events in a distributed system. Lamport clocks are used to capture the causal dependencies between events and ensure consistent ordering. • Vector Clocks: Vector clocks, as mentioned earlier, are also used in causal consistency mechanisms to track and enforce the causal ordering of events across replicas.
  • 19. Replication Techniques • Replication involves creating and maintaining multiple copies of data or components across distributed systems. • Replication offers several benefits, including increased availability, fault tolerance, and performance. • Here are some common replication techniques:
  • 20. Cont’d… • Primary-Backup Replication: In this approach, one replica (the primary) handles all client requests and updates the backup replicas. • If the primary replica fails, one of the backups takes over its role. • This technique ensures that there is always a consistent copy of the data available. • State Machine Replication: State machine replication involves executing the same set of commands on all replicas in the same order. • Each replica applies the commands to its local state machine, ensuring that they all reach the same state. • This technique provides strong consistency but can be resource-intensive due to the need for synchronous communication and coordination.
  • 21. Cont’d… • Quorum-Based Replication: Quorum-based replication requires a certain number of replicas to agree on a write operation before it is considered successful. • The quorum can be a majority, a fixed number, or a percentage of the replicas. • Quorum-based replication balances the trade-off between consistency and performance, allowing systems to continue operating as long as a sufficient number of replicas are available.
  • 23. Synchronization in distributed systems • Synchronization in distributed systems refers to the coordination and ordering of actions or events across multiple processes or components in the system. • It involves ensuring that processes execute their tasks in a mutually agreed-upon and consistent manner. • Synchronization is important in distributed systems for several reasons:
  • 24. Cont’d… • Consistency: Synchronization helps maintain data consistency and coherence across replicas or shared resources. It ensures that concurrent operations do not lead to conflicts or inconsistencies in the system. • Correctness: Synchronization is essential for correctness in distributed systems. It allows processes to coordinate their actions, enforce dependencies, and ensure that critical operations are executed in the correct order. • Mutual Exclusion: Synchronization mechanisms, such as locks or semaphores, enable processes to access shared resources in a mutually exclusive manner. This prevents race conditions and ensures that only one process can access a resource at a time. • Coordination: Synchronization facilitates coordination and cooperation between processes. It enables processes to communicate, exchange data, and synchronize their activities to achieve a common goal.
  • 25. 1. Locking Mechanisms: • Distributed Locks: Processes can acquire and release distributed locks to ensure exclusive access to shared resources. Distributed lock algorithms, such as the Ricart-Agrawala algorithm or the Chandy- Lamport algorithm, enable processes to coordinate their access to critical sections. • Two-Phase Locking (2PL): 2PL is a concurrency control mechanism that ensures serializability by acquiring locks on resources before accessing them and releasing them after the operation is complete.
  • 26. 2. Message Passing: • Message Ordering: Messages exchanged between processes can be ordered to ensure causality or sequential consistency. • Techniques like Lamport timestamps or vector clocks are used to establish the order of events in a distributed system. • Barrier Synchronization: Processes can synchronize their activities using barriers, where each process waits until all participating processes have reached the barrier before continuing execution. • Barriers are commonly used to synchronize parallel or distributed computations.
  • 27. Consensus and Coordination Protocols: • Distributed Consensus: Distributed consensus protocols, such as Paxos or Raft, enable processes to agree on a common value or make coordinated decisions even in the presence of failures or network partitions. • Distributed Coordination: Mechanisms like distributed semaphores or condition variables allow processes to coordinate their activities based on certain conditions or events. Processes can signal or wait for specific conditions to be met before proceeding. • Clock Synchronization: • Clock Synchronization: Distributed systems often require synchronized clocks to order events accurately. Clock synchronization protocols like the Network Time Protocol (NTP) or the Precision Time Protocol (PTP) are used to ensure that clocks across different nodes are closely aligned.
  • 29. Fault Tolerance and Recovery • Fault Types • Fail-Stop: A process halts and cannot recover. • Crash-Recovery: A process halts but can recover and resume. • Byzantine: A process exhibits arbitrary and malicious behavior. • Fault Tolerance Techniques • Redundancy: Replicating components to provide backup. • Error Detection and Correction: Using checksums and error-correcting codes. • Checkpointing: Periodically saving the system state to enable recovery. • Recovery Protocols • Restart-Based Recovery: Restarting failed components from a clean state. • Rollback-Recovery: Returning the system to a previously consistent state. • Checkpoint-Based Recovery: Using saved checkpoints to restore the system state.
  • 30. Cont’d… • Checkpointing: Checkpointing is a technique used to periodically save the system's state. • By creating checkpoints at certain intervals, the system can recover from failures by restoring the state to a previously consistent point. • Checkpointing involves saving critical data and metadata to a stable storage medium to enable recovery and resumption of operations.
  • 31. Process resilience • Process resilience in distributed systems refers to the ability of individual processes or components within the system to withstand failures and continue functioning properly. • It involves designing and implementing mechanisms that enable processes to recover from failures, adapt to changing conditions, and maintain system availability. • Here are some key aspects of process resilience in distributed systems:
  • 32. Fault Detection and Failure Handling: • Monitoring: Processes should be continuously monitored to detect failures or abnormal behavior. This can be done through heartbeat mechanisms, timeouts, or periodic health checks. • Failure Handling: When a failure is detected, the system should have mechanisms in place to handle it. This may involve restarting the failed process, migrating it to a different node, or distributing its workload among other processes. • Redundancy and Replication: Replicating critical processes across multiple nodes provides fault tolerance. If one replica fails, another replica can take over its responsibilities. This ensures that the system can continue functioning even in the presence of failures.
  • 33. Error Recovery and Resynchronization: • Error Handling: Processes should be designed to handle errors gracefully. They should have appropriate error handling mechanisms, such as exception handling and retries, to recover from transient failures. • Checkpointing: Checkpointing involves periodically saving the state of a process to stable storage. In case of failure, the process can recover by restoring its state from the last checkpoint. This ensures that the process can resume its operation without losing significant progress. • State Resynchronization: If a failed process needs to be restarted or migrated, its state may need to be resynchronized with other processes. This can be achieved through techniques like message replay or state transfer from other replicas.
  • 34. Load Balancing and Elasticity: • Load Distribution: Processes should be dynamically balanced across nodes to ensure that the workload is evenly distributed. Load balancing mechanisms, such as request routing algorithms or dynamic resource allocation, can help achieve this. • Elastic Scaling: Process resilience involves the ability to scale the system based on varying workload demands. • Processes should be able to scale up or down dynamically to handle increased or decreased load, respectively.
  • 35. Next… Security Mechanisms in Distributed Systems
  • 36. Security Challenges in Distributed Systems • Distributed systems face various security challenges due to their distributed nature and the potential for attacks on different components. • Some common security challenges include: • Authentication and Authorization: Ensuring the identity of participants in the system and granting appropriate access rights to resources. • Data Confidentiality: Protecting sensitive data from unauthorized access or disclosure during transmission and storage.
  • 37. Cont’d… • Data Integrity: Ensuring that data remains unchanged and uncorrupted during transmission and storage. • Availability: Preventing denial-of-service attacks and ensuring that the system remains accessible to legitimate users. • Secure Communication: Establishing secure channels for communication between distributed components to prevent eavesdropping or tampering.
  • 38. Security Mechanisms in Distributed Systems • To address the security challenges, various security mechanisms are employed in distributed systems. These mechanisms include: • Encryption: Encrypting data using cryptographic algorithms to protect its confidentiality and integrity. • Encryption ensures that only authorized recipients can decrypt and access the data. • Digital Signatures: Using digital signatures to verify the authenticity and integrity of messages. • Digital signatures provide a way to verify that a message has been sent by the claimed sender and that it hasn't been modified during transmission.
  • 39. Cont’d… • A digital signature is a cryptographic mechanism used to verify the authenticity, integrity, and non-repudiation of digital documents or messages. • It provides a way to ensure that a message or document comes from a particular sender and has not been altered during transmission. Here's an overview of how digital signatures are implemented:
  • 40. Cont’d… • Key Pair Generation: • Public Key: The sender generates a key pair consisting of a public key and a private key. The public key is shared with others and is used for verifying the digital signatures created by the private key. • Signature Generation: • Hashing: The sender calculates a cryptographic hash function (such as SHA- 256) of the document or message to be signed. The hash function produces a fixed-length output that uniquely represents the document. • Signing: The sender then encrypts the hash value using their private key. This encrypted hash, known as the digital signature, is specific to the document and the sender's private key.
  • 41. Cont’d… • Signature Verification: • Signature Extraction: The recipient of the document retrieves the digital signature attached to the document. • Public Key Usage: The recipient uses the sender's public key to decrypt the digital signature, obtaining the original hash value. • Hash Calculation: The recipient independently calculates the hash of the received document using the same hash function. • Comparison: The recipient compares the calculated hash with the decrypted hash obtained from the digital signature. If the two values match, it means the document has not been tampered with during transmission and that the sender is the legitimate signer.
  • 42. Cont’d… • Access Control: Implementing access control mechanisms to enforce authorization policies and restrict access to resources based on user roles and permissions. • Firewalls and Intrusion Detection Systems (IDS): Deploying firewalls and IDS to monitor network traffic, detect and prevent unauthorized access attempts, and identify potential security breaches. • Secure Communication Protocols: Using secure communication protocols such as SSL/TLS (Secure Sockets Layer/Transport Layer Security) to establish secure connections and protect data during transmission.
  • 43. Cont’d… • Security Auditing and Logging: Implementing auditing and logging mechanisms to record and monitor system activities, detect security incidents, and facilitate forensic analysis in case of security breaches. • Logging mechanisms • Log Generation: Distributed systems generate logs that capture relevant events and activities. Logs can include information such as user actions, system events, network activities, authentication attempts, and error conditions. • Log Format: Logs are typically stored in a standardized format, such as plain text or structured formats like JSON or XML. The log format should include essential details such as timestamps, event descriptions, source IP addresses, and other relevant metadata.
  • 44. Cont’d… • Log Collection: Logs from distributed components are collected and consolidated in a centralized location or a distributed log management system. • This centralization enables easier analysis, correlation, and search capabilities across multiple log sources.
  • 45. Auditing Mechanism: • Security Policies: Establishing security policies and standards is crucial for auditing. • These policies define what activities are considered normal and acceptable within the system, and what actions should be flagged as potential security incidents. • Event Monitoring: Auditing involves monitoring and analyzing logged events to identify potential security issues or violations of security policies. • This may involve using automated tools, intrusion detection systems, or manual reviews of log entries.
  • 46. Cont’d… • Alerting and Notifications: Auditing systems can be configured to generate alerts or notifications when specific security-related events or patterns are detected. • These alerts can be sent to security administrators or a Security Operations Center (SOC) for immediate response and investigation.
  • 47. Thankyou for your Attention.