1. 20IT703 - PRINCIPLES OF
DISTRIBUTED SYSTEMS
IV Year / VII Semester
Prepared by,
M. Suganthi., M.E., (Ph.D)
AP / IT
2. OBJECTIVES
• The Student should be made to:
Learn about the introduction of Distributed environment.
Understand the concept of process and Synchronization in distributed
environment.
Gain knowledge about peer to peer overlay networks.
Acquire knowledge on fault tolerance techniques and security in
distributed system.
Study about the concept of network filesystems and middleware
technologies.
3. UNIT I - INTRODUCTION
Introduction - Examples of Distributed Systems - Focusing on
resource Sharing - Challenges - API for Internet Protocol -
External Data Representation and Marshaling - Multicast
communication- Remote Procedure Call -Group
Communication- Publish-subscribe systems.
4. Introduction
Distributed System is a collection of
autonomous computer systems that are
physically separated but are connected by a
centralized computer network that is equipped
with distributed system software.
5. Cond.
• The autonomous computers will communicate
among each system by sharing resources and
files and performing the tasks assigned to
them.
6. Cond.
• Also known as distributed computing and
distributed databases, a distributed system is a
collection of independent components located
on different machines that share messages with
each other in order to achieve common goals.
7. Cond.
• A distributed system is a computing
environment in which various components are
spread across multiple computers (or other
computing devices) on a network. These
devices split up the work, coordinating their
efforts to complete the job more efficiently
than if a single device had been responsible for
the task.
11. Examples of Distributed Systems
Any Social Media can have its Centralized
Computer Network as its Headquarters and
computer systems that can be accessed by any
user and using their services will be the
Autonomous Systems in the Distributed System
Architecture.
13. Cond.
• Distributed System Software: This Software
enables computers to coordinate their activities
and to share the resources such as Hardware,
Software, Data, etc.
• Database: It is used to store the processed data
that are processed by each Node/System of the
Distributed systems that are connected to
the Centralized network.
16. Cond.
• WAP (Wireless Application Protocol) is a
standardized technology for cross-platform,
distributed computing very similar to the
Internet's combination of Hypertext Markup
Language (HTML) and Hypertext Transfer
Protocol (HTTP), except that it is optimized
for: low-display capability. low-memory.
17. Focusing on resource Sharing
• Resource sharing is the main motivating factor
for constructing distributed systems. Resources
such as printers, files, web pages or database
records are managed by servers of the
appropriate type. For example, web servers
manage web pages and other web resources.
18. Cond.
• Resource Sharing is basically how the existing
resource in the distributed system can be
shared and accessed across different computer
system. The resource shared can be software,
hardware or any data in the distributed
system.
19. Cond.
• The resource of distributed system are made available
in following ways:
• Data Migration - The process in which data is
transferred from one location to another location in
the system. Data is brought to the location of
computation, that needs to be accessed by a
distributed system. We can also say that the data is
migrated from destination part to source, after
requesting source to destination.
20. Cond.
• Computation Migration - The process in
which computation is transferred rather than
the data across the system. The files cannot be
transferred to the computation point due to
large file size or big data file. So, the
computation data is transferred to the data file
point and processes the computation and the
result is transferred to the computation point.
21. Cond.
• Advantages of Computation Migration:
• Increases computational speed.
• Load Balancing - It helps spread the load
across the distributed system in order to
optimize resource sharing.
22. Cond.
• Web Challenges in Distributed System
• Scalability –The feature of distributed systems
in which if the load of a system increases the
performance of the system will not be
degraded.
• Heterogeneity - The ability to communication
with different devices. For example,
communication between computer to mobile
or other peripheral devices.
23. Cond.
• Security Challenges - There are three types of
Security Challenges, i.e. Privacy,
Authentication, Availability.
– Privacy: Data Shared should maintain
confidentiality.
– Authentication: Whoever shares the message
should have proper identity for that system.
Unauthorised users should not access the system.
– Availability: Data and Resources should be
available.
24. Cond.
• Handling of Failure - There are three types of
failures that can occur in a system which need
to be fixed.
• Tolerance: If there is any error occurring
while running the system then the system
continues operating without any interruption.
• Redundancy: It means that any duplicity or
inconsistency should not be found in our
system.
26. Challenges
• Distributed System is a collection of autonomous
computer systems that are physically separated but
are connected by a centralized computer network that
is equipped with distributed system software. These
are used in numerous applications, such as online
gaming, web applications, and cloud computing.
However, creating a distributed system is not simple,
and there are a number of design considerations to
take into account.
27. Cond.
• Challenges and Failures of a Distributed System
are:
• Heterogeneity.
• Scalability.
• Openness.
• Transparency.
• Concurrency.
• Security.
• Failure Handling.
28. API for Internet Protocol
What is an API?
• An API or Application Programming
Interface is a facilitator that enables apps,
databases, softwares and IoT devices to
communicate with each other, without which
they won’t be able to interact. It's a set of tools
and protocols used by developers to build user-
friendly software apps.
29. Cond.
• API is a programming interface between
application programs and communication
subsystems based on open network protocols.
The API lets any application program
operating in its own MVS (Multiple virtual
Storage) address space to access and use
communication services provided by an MVS
subsystem that implements this interface.
30. Cond.
• API protocols encompass the standards and
process that an API uses to communicate —
and the data format. An API can use multiple
API protocols; for instance, you may be
provided with the option of using a REST
(Representational State Transfer) or SOAP
(Simple Object Access Protocol) - API to
integrate with an email server.
31. Cond.
• In computer science, a protocol pertains
specifically to communication. This means you
have at least two parties and the protocol
defines how each is to behave. An API is a
static definition of the way some resource can
be accessed and/or utilized. It is typically call-
based where a protocol is message-based.
32. External Data Representation
and Marshaling
• The information stored in running programs is
represented as data structures. In a distributed system
information in messages transferring between
components consists of sequences of bytes. So, to
communicate any information these data structures
must be converted to a sequence of bytes before
transmission. Also in the arrival of messages, the data
should be re-converted into its original data structure.
33. Cond.
• There are several different types of data that use in
computers and these types are not the same in every
place that data needed to transfer. Let’s see how these
types differ from one to another.
• Integers have two different types — big-endian and
little-endian
• Floats — Different representation in different
architectures
• Characters — ASCII and Unicode
34. Cond.
• To effectively communicate these different
types of data between computers there should
be a way to convert every data to a common
format. External data representation is the data
type that act as the intermediate data type in
the transmission.
35. Cond.
• Marshalling
• Marshalling is the process of taking a
collection of the data structures to transfer
and format them into an external data
representation type which suitable for
transmission in a message.
36. Cond.
• Unmarshalling
• Unmarshalling is the inverse of this process,
which is reformatting the transferred data
on arrival to produce the original data
structures at the destination.
37. Multicast communication
• When the host process tries to communicate
with a designated group of processes in a
distributed system at the same time. This
technique is mainly used to find a way to
address problem of a high workload on host
system and redundant information from
process in system.
38. Cond.
• Multicasting refers to a single source of
communication with simultaneous multiple
receivers. Most popular distributed multimedia
applications require multicasting. For
example, multiparty audio/video
conferencing is one of the most widely used
services in Internet telephony.
39. Cond.
• Three widely-used models for communication:
Remote Procedure Call (RPC), Message-
Oriented Middleware (MOM), and data
streaming. We also discuss the general
problem of sending data to multiple receivers,
called multicasting.
40. Cond.
• Multicast Routing Protocols
• To route our multicast traffic, we need to use
a multicast routing protocol. There are two
types of multicast routing protocols:
• Dense Mode
• Sparse Mode
41. Cond.
• Dense Mode
• Dense mode multicast routing protocols are
used for networks where most subnets in
your network should receive the multicast
traffic. When a router receives the multicast
traffic, it will flood it on all of its interfaces
except the interface where it received the
multicast traffic on.
44. Cond.
Sparse mode:
• To solve this issue, PIM (Protocol Independent
Multicast) sparse mode uses a RP
(Rendezvous Point) in the network. Here’s
how it works:
• Each router that receives multicast traffic from
a source will forward it to the RP.
• Each router that wants to receive multicast
traffic will go to the RP.
45. Cond.
• The RP is like a
“meeting point” for
multicast traffic. Here’s
an example:
46. Cond.
• A multicast router can tell its neighbor that it
doesn’t want to receive the multicast traffic
anymore. This happens when:
• The router doesn’t have any downstream
neighbors that require the multicast traffic.
• The router doesn’t have any hosts on its
directly connected interface that require the
multicast traffic.
48. Cond.
• Above we see R1 that receives the multicast
traffic from our video server. It floods this
multicast traffic to R2 and R3 but these two
routers don’t have any interest in the multicast
traffic. They will send a prune message to
signal R1 that it should no longer forward the
multicast traffic.
49. Cond.
• There are a number of dense mode routing
protocols:
• DVMRP (Distance Vector Multicast Routing
Protocol)
• MOSPF (Multicast OSPF)
• PIM Dense Mode (Protocol Independent
Multicast)
50. Cond.
• RPF (Reverse Path Forwarding)
• Multicast routing is vulnerable to routing
loops. One simple loop-prevention mechanism
is that routers will never forward multicast
packets on the interface where they received
the packet on. There is one additional check
however called RPF (Reverse Path
Forwarding).
52. Cond.
• we have R1 which receives a multicast packet
which is flooded on all interfaces except the
interface that connects to the video server. I’m
only showing the packet that is flooded
towards R3 here:
• R1 floods the packet to R3.
• R3 floods the packet to R2.
• R2 floods it back to R1.
53. Cond.
• When the multicast packet is received on the
interface that matches the information from the
unicast routing table, it passes the RPF check
and we accept the packet. When it fails the
RPF check, we drop the packet.
55. Cond.
• Above we see R1 which floods the multicast
traffic to R2 and R3. R2 also floods it to R3.
• R3 will now perform a RPF check for both
multicast packets. It sees the source address is
192.168.1.100 and checks the unicast routing
table. It finds an OSPF entry for
192.168.1.0/24 that points to R1.
56. Cond.
• The packet that it receives from R1 will pass
the RPF check since we receive it on the
interface, the one it receives from R2 doesn’t.
The multicast packet from R2 will be
dropped.
57. Cond.
• R3 will then flood the multicast packet towards R2
who will also do a RPF check. It will drop this packet
since R2 uses its interface towards R1 to reach
192.168.1.100.
• Another way to look at this is that the RPF check
ensures that we only accept multicast packets from
the shortest path. Multicast packets that
travel longer paths are dropped. For R3 the shortest
path to 192.168.1.100 is through R1.
58. Remote Procedure Call
• Remote Procedure Call (RPC) is a
communication technology that is used by one
program to make a request to another program
for utilizing its service on a network without
even knowing the network's details.
59. Cond.
• Remote Procedure Call is a technique for
building distributed systems. Basically, it
allows a program on one machine to call a
subroutine on another machine without
knowing that it is remote. RPC is not a
transport protocol: rather, it is a method of
using existing communications features in a
transparent way. This transparency is one of
the great strengths of RPC as a tool.
60. Cond.
• Because the application software does not
contain any communication code, it is
independent of the particular communications
hardware and protocols used.
• The operating system used the calling
sequence needed to use the underlying
communications software.
61. Cond.
• A remote procedure call (RPC) is when a computer
program causes a procedure (subroutine) to execute
in a different address space (commonly on another
computer on a shared network), which is written as if
it were a normal (local) procedure call, without the
programmer explicitly writing the details for the
remote interaction. That is, the programmer writes
essentially the same code whether the subroutine is
local to the executing program, or remote.
62. Cond.
• This is a form of client–server interaction
(caller is client, executor is server), typically
implemented via a request–response message-
passing system. In the object-oriented
programming paradigm, RPCs are represented
by remote method invocation (RMI).
63. Cond.
• The RPC model implies a level of location
transparency, namely that calling procedures
are largely the same whether they are local or
remote, but usually, they are not identical, so
local calls can be distinguished from remote
calls.
64. Group Communication
• Communication between two processes in a
distributed system is required to exchange
various data, such as code or a file, between
the processes. When one source process tries
to communicate with multiple processes at
once, it is called Group Communication.
65. Cond.
• A group is a collection of interconnected
processes with abstraction. This abstraction is
to hide the message passing so that the
communication looks like a normal procedure
call. Group communication also helps the
processes from different hosts to work together
and perform operations in a synchronized
manner, therefore increasing the overall
performance of the system.
66. Cond.
• Group Communication occurs when a single
source process simultaneously attempts to
communicate with numerous functions.
67. Cond.
• A group is an abstract collection of
interrelated operations. This abstraction hides
the message passing such that the
communication seems to be a standard
procedure call. Group communication also
enables processes from separate hosts to
collaborate and conduct activities in a
coordinated way, improving overall system
performance.
69. Cond.
Types of group communication
• Broadcast communication
This occurs when the host simultaneously attempts to
communicate with all the processes in a distributed
system. It is helpful when a consistent information
stream must be supplied to all methods effectively.
Communication is highly rapid compared to other
means of communication since it does not require
processing. It doesn't, however, support many
operations and cannot address each function
independently.
71. Cond.
• Multicast communication
• The host process attempts to simultaneously
interact with a specific set of operations in a
distributed system. This approach is mainly
used to discover solutions to the high burden
on the host system and duplicate information
from system processes. Multitasking can
considerably reduce the time required to
handle messages.
73. Cond.
• Unicast communication
• This occurs when the host process attempts to
interact with a single operation in a distributed
system simultaneously. It works well for two
processes interacting since it treats one way.
However, it incurs costs since it must first
determine the specific procedure and then
communicate information.
75. Cond.
• Group communication characteristics
• Atomicity, often known as an all-or-nothing
quality, is a crucial property in the group
communication mechanism. If one or more
group members have a problem receiving the
message, the process that delivers it to them
will get an error notice.
76. Cond.
• The ordering attribute of the messages is in
charge of managing the order in which
messages are delivered. Message ordering
types include:
• No order means message sending happens
without regard for the order to the group.
• FIFO order means messages are shown in the
order they are sent.
77. Cond.
• Casual order means messages are shipped in
a random order after receiving another
message.
• Total order means all communications are
sent to all group members in the same order.
78. Cond.
• Group organization
• Group communication systems can be classified as
either closed or open. Only members of the closed
group can send messages to the group. Users who are
not group members can send messages to each
member separately. Non-members in the open
group can send messages to the group. The program's
objective determines the use of a closed or open
group.
79. Publish-subscribe systems
• A distributed publish/subscribe configuration
is a set of queue managers connected together.
The queue managers can all be on the same
physical system, or they can be distributed
over several physical systems.
80. Cond.
• Pub/Sub provides a framework for exchanging
messages between publishers (components that
create and send messages) and subscribers
(components that receive and consume
messages). Note that publishers don't send
messages to specific subscribers in a point-to-
point manner.
81. Cond.
• Publish–subscribe is a messaging pattern
where senders of messages, called publishers,
do not program the messages to be sent
directly to specific receivers, called
subscribers, but instead categorize published
messages into classes without knowledge of
which subscribers, if any, there may be.
82. Cond.
• Distributed Pub/Sub System is a
communication paradigm that allows freedom
in the distributed system by the decoupling of
communication entities in terms of time, space
and synchronization.
• An event service system that is asynchronous,
anonymous and loosely-coupled.
• Ability to quickly adapt in a dynamic
environment.
83. Cond.
• Key components of Pub/Sub System
• Publishers : Publishers generate event data and
publishes them.
• Subscribers : Subscribers submit their
subscriptions and process the events received
• P/S service: It’s the mediator/broker that filters
and routes events from publishers for
interested subscribers.