Chapter Introductionn to distributed system .pptx

Introduction to Distributed
Systems

Introduction to DS
• A distributed system is a collection of autonomous computers
connected through a network, working together as a single cohesive
system.
• Characteristics: Distribution of resources, concurrency, and failure
independence.
• Examples: Internet, cloud computing, peer-to-peer networks.

Why Distributed Systems?
• Advantages: Increased performance, scalability, reliability, fault
tolerance, and resource sharing.
• Challenges: Coordination, communication, consistency, and security.

Design Goals of Distributed Systems
• Scalability: The system should be able to handle an increasing
number of users and resources.
• Reliability: The system should continue to function despite individual
component failures.
• Performance: The system should provide efficient and timely
responses to user requests.
• Transparency: The system should appear as a single, unified entity to
its users.
• Flexibility: The system should be adaptable to changing requirements
and environments.

Types of Distributed Systems
• Cluster Computing Systems:
• Cluster computing systems combine multiple machines or servers to form a
cluster that works together to perform large-scale computational tasks.
They distribute the workload among cluster nodes and often leverage
parallel processing techniques. Examples include Apache Hadoop and
Apache Spark.
• Grid Computing Systems:
• Grid computing systems connect geographically distributed resources to
form a virtual supercomputer. They enable the sharing of computing
power, storage, and data across different organizations or institutions. Grid
systems are typically used for scientific computing, research collaborations,
and resource-intensive applications.

Cont’d…
• Cloud Computing Systems:
• Cloud computing systems provide on-demand access to a pool of
computing resources, including virtual machines, storage, and services,
over the internet. They offer scalability, flexibility, and pay-per-use billing
models. Examples include Amazon Web Services (AWS), Microsoft Azure,
and Google Cloud Platform (GCP).
• Internet of Things (IoT) Systems:
• IoT systems connect a large number of devices and sensors, often
geographically distributed, to collect and exchange data. They involve
distributed processing, data aggregation, and coordination among devices
and services. IoT systems are used in various domains such as smart
homes, industrial automation, and smart cities.

Cont’d…
• Distributed File Systems:
• Distributed file systems are designed to provide a unified view of file
storage across multiple machines. They distribute file data and metadata
across nodes, allowing clients to access and manipulate files as if they were
stored on a single machine. Examples include the Google File System (GFS)
and the Hadoop Distributed File System (HDFS).
• Distributed Database Systems:
• Distributed database systems store and manage data across multiple
nodes to provide scalability, fault tolerance, and improved performance.
They distribute data across nodes and support distributed query processing
and transaction management. Examples include Apache Cassandra and
Apache HBase.

Architectural Models
• Client-Server Model
• Clients request services from servers.
• Servers provide services to clients.
• Example: Web applications with web browsers (clients) and web servers.
• Peer-to-Peer Model
• Peers communicate and collaborate directly with each other.
• Peers can act as clients or servers.
• Example: File sharing networks like BitTorrent.
• Hybrid Models
• Combine elements of both client-server and peer-to-peer models.
• Example: Distributed databases with dedicated server nodes and peer replication.

Communication in Distributed Systems
• Communication Models
• Message Passing: Communication through explicit message exchanges.
• Remote Procedure Call (RPC): Invoking procedures on remote machines.
• Publish-Subscribe: Subscribers receive notifications about events from publishers.
• Message Queues: Messages are stored in queues for asynchronous processing.
• Communication Protocols
• TCP/IP: Transmission Control Protocol/Internet Protocol for reliable and
connection-oriented communication.
• UDP: User Datagram Protocol for unreliable and connectionless communication.
• HTTP: Hypertext Transfer Protocol for web-based communication.
• MQTT: Message Queuing Telemetry Transport for lightweight publish-subscribe
messaging.

Next…
Chapter Five
Consistency Models

Consistency Models
• Consistency models define the guarantees about the order and visibility of
data in a distributed system.
• A replica refers to a copy of data or a component in a distributed system
that is stored and maintained on multiple nodes.
• Some commonly discussed consistency models include:
• Strong Consistency: In a strongly consistent system, all replicas show the
same data at all times.
• Any read operation immediately reflects the most recent write operation.
• Achieving strong consistency often requires coordination and
synchronization between replicas, which can impact performance and
availability.

Types of Strong Consistency
• Two-Phase Locking (2PL): This mechanism ensures that conflicting
operations on shared data are serialized.
• It involves acquiring locks before accessing shared data and releasing
them after the operation is complete.
• 2PL guarantees strict serializability but can introduce contention and
affect system performance.
• Distributed Transaction Commit Protocol: Consistency can be
enforced using distributed transaction commit protocols, such as the
Two-Phase Commit (2PC) or Three-Phase Commit (3PC).

Cont’d…
• Eventual Consistency: Eventual consistency allows replicas to
temporarily show different data but guarantees that they will
eventually converge to a consistent state.
• This model relaxes the synchronization requirements, allowing
replicas to operate independently and asynchronously.
• It is often used in systems that prioritize availability and partition
tolerance over strict consistency.

Cont’d…
• Vector Clocks: Vector clocks are used to track the causal ordering of events
in a distributed system. Each replica maintains a vector clock that is
updated with each event. The vector clock information helps determine the
relative ordering of events across replicas.
• Anti-Entropy and Merkle Trees: Anti-entropy mechanisms, such as the
Gossip Protocol, periodically exchange updates between replicas to
synchronize data. Merkle trees are used to efficiently detect differences
and reconcile inconsistencies between replicas by verifying the integrity of
data blocks.
• Conflict Resolution and Convergence: In eventual consistency, conflicts
may arise when concurrent updates occur on different replicas. Conflict
resolution techniques, such as Last-Writer-Wins (LWW) or Multi-Value
Convergence (MVC), are used to reconcile conflicting updates and converge
the data to a consistent state over time.

Cont’d…
• Causal Consistency: Causal consistency ensures that the order of
causally related events is preserved across replicas.
• If one event causally depends on another, all replicas must observe
the same causal order. However, the ordering of unrelated events can
be different across replicas.
• Read/Write Consistency: Some systems provide different consistency
levels for read and write operations.
• For example, a system may offer strong consistency for write
operations to ensure data integrity but provide eventual consistency
for read operations to improve performance.

Cont’d…
• Dependency Tracking: Causal consistency mechanisms track the
causal dependencies between events. This can be done through
explicit metadata or implicit tracking based on the ordering of events.
Lamport Clocks: Lamport clocks assign a unique timestamp to each
event and help establish a partial ordering of events in a distributed
system. Lamport clocks are used to capture the causal dependencies
between events and ensure consistent ordering.
• Vector Clocks: Vector clocks, as mentioned earlier, are also used in
causal consistency mechanisms to track and enforce the causal
ordering of events across replicas.

Replication Techniques
• Replication involves creating and maintaining multiple copies of data
or components across distributed systems.
• Replication offers several benefits, including increased availability,
fault tolerance, and performance.
• Here are some common replication techniques:

Cont’d…
• Primary-Backup Replication: In this approach, one replica (the primary)
handles all client requests and updates the backup replicas.
• If the primary replica fails, one of the backups takes over its role.
• This technique ensures that there is always a consistent copy of the data
available.
• State Machine Replication: State machine replication involves executing
the same set of commands on all replicas in the same order.
• Each replica applies the commands to its local state machine, ensuring that
they all reach the same state.
• This technique provides strong consistency but can be resource-intensive
due to the need for synchronous communication and coordination.

Cont’d…
• Quorum-Based Replication: Quorum-based replication requires a
certain number of replicas to agree on a write operation before it is
considered successful.
• The quorum can be a majority, a fixed number, or a percentage of the
replicas.
• Quorum-based replication balances the trade-off between
consistency and performance, allowing systems to continue operating
as long as a sufficient number of replicas are available.

Next…
Synchronization in distributed systems

Synchronization in distributed systems
• Synchronization in distributed systems refers to the coordination and
ordering of actions or events across multiple processes or
components in the system.
• It involves ensuring that processes execute their tasks in a mutually
agreed-upon and consistent manner.
• Synchronization is important in distributed systems for several
reasons:

Cont’d…
• Consistency: Synchronization helps maintain data consistency and coherence
across replicas or shared resources. It ensures that concurrent operations do not
lead to conflicts or inconsistencies in the system.
• Correctness: Synchronization is essential for correctness in distributed systems. It
allows processes to coordinate their actions, enforce dependencies, and ensure
that critical operations are executed in the correct order.
• Mutual Exclusion: Synchronization mechanisms, such as locks or semaphores,
enable processes to access shared resources in a mutually exclusive manner. This
prevents race conditions and ensures that only one process can access a resource
at a time.
• Coordination: Synchronization facilitates coordination and cooperation between
processes. It enables processes to communicate, exchange data, and synchronize
their activities to achieve a common goal.

1. Locking Mechanisms:
• Distributed Locks: Processes can acquire and release distributed locks
to ensure exclusive access to shared resources. Distributed lock
algorithms, such as the Ricart-Agrawala algorithm or the Chandy-
Lamport algorithm, enable processes to coordinate their access to
critical sections.
• Two-Phase Locking (2PL): 2PL is a concurrency control mechanism
that ensures serializability by acquiring locks on resources before
accessing them and releasing them after the operation is complete.

2. Message Passing:
• Message Ordering: Messages exchanged between processes can be
ordered to ensure causality or sequential consistency.
• Techniques like Lamport timestamps or vector clocks are used to
establish the order of events in a distributed system.
• Barrier Synchronization: Processes can synchronize their activities
using barriers, where each process waits until all participating
processes have reached the barrier before continuing execution.
• Barriers are commonly used to synchronize parallel or distributed
computations.

Consensus and Coordination Protocols:
• Distributed Consensus: Distributed consensus protocols, such as Paxos or
Raft, enable processes to agree on a common value or make coordinated
decisions even in the presence of failures or network partitions.
• Distributed Coordination: Mechanisms like distributed semaphores or
condition variables allow processes to coordinate their activities based on
certain conditions or events. Processes can signal or wait for specific
conditions to be met before proceeding.
• Clock Synchronization:
• Clock Synchronization: Distributed systems often require synchronized
clocks to order events accurately. Clock synchronization protocols like the
Network Time Protocol (NTP) or the Precision Time Protocol (PTP) are used
to ensure that clocks across different nodes are closely aligned.

Next…
Fault Tolerance and Recovery

Fault Tolerance and Recovery
• Fault Types
• Fail-Stop: A process halts and cannot recover.
• Crash-Recovery: A process halts but can recover and resume.
• Byzantine: A process exhibits arbitrary and malicious behavior.
• Fault Tolerance Techniques
• Redundancy: Replicating components to provide backup.
• Error Detection and Correction: Using checksums and error-correcting codes.
• Checkpointing: Periodically saving the system state to enable recovery.
• Recovery Protocols
• Restart-Based Recovery: Restarting failed components from a clean state.
• Rollback-Recovery: Returning the system to a previously consistent state.
• Checkpoint-Based Recovery: Using saved checkpoints to restore the system
state.

Cont’d…
• Checkpointing: Checkpointing is a technique used to periodically save
the system's state.
• By creating checkpoints at certain intervals, the system can recover
from failures by restoring the state to a previously consistent point.
• Checkpointing involves saving critical data and metadata to a stable
storage medium to enable recovery and resumption of operations.

Process resilience
• Process resilience in distributed systems refers to the ability of
individual processes or components within the system to withstand
failures and continue functioning properly.
• It involves designing and implementing mechanisms that enable
processes to recover from failures, adapt to changing conditions, and
maintain system availability.
• Here are some key aspects of process resilience in distributed
systems:

Fault Detection and Failure Handling:
• Monitoring: Processes should be continuously monitored to detect
failures or abnormal behavior. This can be done through heartbeat
mechanisms, timeouts, or periodic health checks.
• Failure Handling: When a failure is detected, the system should have
mechanisms in place to handle it. This may involve restarting the
failed process, migrating it to a different node, or distributing its
workload among other processes.
• Redundancy and Replication: Replicating critical processes across
multiple nodes provides fault tolerance. If one replica fails, another
replica can take over its responsibilities. This ensures that the system
can continue functioning even in the presence of failures.

Error Recovery and Resynchronization:
• Error Handling: Processes should be designed to handle errors gracefully.
They should have appropriate error handling mechanisms, such as
exception handling and retries, to recover from transient failures.
• Checkpointing: Checkpointing involves periodically saving the state of a
process to stable storage. In case of failure, the process can recover by
restoring its state from the last checkpoint. This ensures that the process
can resume its operation without losing significant progress.
• State Resynchronization: If a failed process needs to be restarted or
migrated, its state may need to be resynchronized with other processes.
This can be achieved through techniques like message replay or state
transfer from other replicas.

Load Balancing and Elasticity:
• Load Distribution: Processes should be dynamically balanced across
nodes to ensure that the workload is evenly distributed. Load
balancing mechanisms, such as request routing algorithms or
dynamic resource allocation, can help achieve this.
• Elastic Scaling: Process resilience involves the ability to scale the
system based on varying workload demands.
• Processes should be able to scale up or down dynamically to handle
increased or decreased load, respectively.

Next…
Security Mechanisms in Distributed Systems

Security Challenges in Distributed Systems
• Distributed systems face various security challenges due to their
distributed nature and the potential for attacks on different
components.
• Some common security challenges include:
• Authentication and Authorization: Ensuring the identity of
participants in the system and granting appropriate access rights to
resources.
• Data Confidentiality: Protecting sensitive data from unauthorized
access or disclosure during transmission and storage.

Cont’d…
• Data Integrity: Ensuring that data remains unchanged and
uncorrupted during transmission and storage.
• Availability: Preventing denial-of-service attacks and ensuring that
the system remains accessible to legitimate users.
• Secure Communication: Establishing secure channels for
communication between distributed components to prevent
eavesdropping or tampering.

Security Mechanisms in Distributed Systems
• To address the security challenges, various security mechanisms are
employed in distributed systems. These mechanisms include:
• Encryption: Encrypting data using cryptographic algorithms to protect its
confidentiality and integrity.
• Encryption ensures that only authorized recipients can decrypt and access
the data.
• Digital Signatures: Using digital signatures to verify the authenticity and
integrity of messages.
• Digital signatures provide a way to verify that a message has been sent by
the claimed sender and that it hasn't been modified during transmission.

Cont’d…
• A digital signature is a cryptographic mechanism used to verify the
authenticity, integrity, and non-repudiation of digital documents or
messages.
• It provides a way to ensure that a message or document comes from
a particular sender and has not been altered during transmission.
Here's an overview of how digital signatures are implemented:

Cont’d…
• Key Pair Generation:
• Public Key: The sender generates a key pair consisting of a public key and a
private key. The public key is shared with others and is used for verifying
the digital signatures created by the private key.
• Signature Generation:
• Hashing: The sender calculates a cryptographic hash function (such as SHA-
256) of the document or message to be signed. The hash function produces
a fixed-length output that uniquely represents the document.
• Signing: The sender then encrypts the hash value using their private key.
This encrypted hash, known as the digital signature, is specific to the
document and the sender's private key.

Cont’d…
• Signature Verification:
• Signature Extraction: The recipient of the document retrieves the digital
signature attached to the document.
• Public Key Usage: The recipient uses the sender's public key to decrypt the
digital signature, obtaining the original hash value.
• Hash Calculation: The recipient independently calculates the hash of the
received document using the same hash function.
• Comparison: The recipient compares the calculated hash with the
decrypted hash obtained from the digital signature. If the two values
match, it means the document has not been tampered with during
transmission and that the sender is the legitimate signer.

Cont’d…
• Access Control: Implementing access control mechanisms to enforce
authorization policies and restrict access to resources based on user
roles and permissions.
• Firewalls and Intrusion Detection Systems (IDS): Deploying firewalls
and IDS to monitor network traffic, detect and prevent unauthorized
access attempts, and identify potential security breaches.
• Secure Communication Protocols: Using secure communication
protocols such as SSL/TLS (Secure Sockets Layer/Transport Layer
Security) to establish secure connections and protect data during
transmission.

Cont’d…
• Security Auditing and Logging: Implementing auditing and logging
mechanisms to record and monitor system activities, detect security
incidents, and facilitate forensic analysis in case of security breaches.
• Logging mechanisms
• Log Generation: Distributed systems generate logs that capture relevant
events and activities. Logs can include information such as user actions,
system events, network activities, authentication attempts, and error
conditions.
• Log Format: Logs are typically stored in a standardized format, such as
plain text or structured formats like JSON or XML. The log format should
include essential details such as timestamps, event descriptions, source IP
addresses, and other relevant metadata.

Cont’d…
• Log Collection: Logs from distributed components are collected and
consolidated in a centralized location or a distributed log
management system.
• This centralization enables easier analysis, correlation, and search
capabilities across multiple log sources.

Auditing Mechanism:
• Security Policies: Establishing security policies and standards is
crucial for auditing.
• These policies define what activities are considered normal and
acceptable within the system, and what actions should be flagged as
potential security incidents.
• Event Monitoring: Auditing involves monitoring and analyzing logged
events to identify potential security issues or violations of security
policies.
• This may involve using automated tools, intrusion detection systems,
or manual reviews of log entries.

Cont’d…
• Alerting and Notifications: Auditing systems can be configured to
generate alerts or notifications when specific security-related events
or patterns are detected.
• These alerts can be sent to security administrators or a Security
Operations Center (SOC) for immediate response and investigation.

Chapter Introductionn to distributed system .pptx

Recommended

Recommended

More Related Content

Similar to Chapter Introductionn to distributed system .pptx

Similar to Chapter Introductionn to distributed system .pptx (20)

More from Tekle12

More from Tekle12 (20)

Recently uploaded

Recently uploaded (20)

Chapter Introductionn to distributed system .pptx