Chapter-1-IntroDistributeddffsfdfsdf-1.pptx

CHAPTER ONE
Introduction to Distributed Systems
1
1

Main contents
2
 Introduction
 Definitions
 Goals
2

Distributed Systems Goals
3
By the end of the course, students will be able to:
1. Define a distributed system and give examples of several different
distributed systems paradigms.
2. Design and implement application level communication protocols using
TCP or UDP.
3. Design and implement a tool which works in a client-server architecture and uses TCP
or UDP for communication.
4. Design and implement a tool which works in a peer-to-peer architecture and uses TCP
or UDP for communication.
5. Explain fundamental problems in distributed systems relating to
synchronization, mutual exclusion, replication, and fault tolerance.
6. Design and implement applications which communicate using application level
multicast and epidemic communication mechanisms.
7. Build and test a distributed communication tool which is based on the Apache Kafka
Distributed Streaming Platform.
3

Distributed System: Definition
A distributed system is a piece of software that ensures:
a collection of autonomous computing elements that appears to its users as a
single coherent system
Two aspects in distributed system :
1) Autonomous computing elements, also referred to as nodes, be they hardware
devices or software processes and
2) Single coherent system: users or applications perceive a single system nodes
need to collaborate.
Each node is autonomous:
• Its own notion of time  there is no global clock
• Leads to fundamental synchronization and coordination problems.
Collection of nodes and group:
• How to manage group membership?
• How to know you are communicating with an authorized member?
4
4

Why Distributed?
Because we need to…
Resource and Data Sharing
printers, databases, multimedia servers, ...
Availability, Reliability
the loss of some instances can be hidden
Scalability, Extensibility
the system grows with demand (e.g., extra servers)
Performance
huge power (CPU, memory, ...) available
Inherent distribution, communication
organizational distribution, e-mail, video
5
5

Problems of Distribution
Concurrency, Security
clients must not disturb each other
Privacy
unwanted communication such as spam
Partial failure
we often do not know where the error is (e.g., RPC)
Location, Migration, Replication
clients must be able to find their servers
Heterogeneity
hardware, platforms, languages, management
6
6

Characteristics of Distributed Systems
 Differences between the computers and the way they communicate are hidden from
users.
 Users and applications can interact with a distributed system in a consistent and
uniform way regardless of location.
 Distributed systems should be easy to expand and scale.
 A distributed system is normally continuously available, even if there may be partial
failures.
7
7

Organization of distributed system
Overlay network
• Each node in the collection
communicates only with other
nodes in the system, its neighbors.
• The set of neighbors may be
dynamic, or may even be known
only implicitly (i.e., requires a
lookup).
Overlaytypes
• Well-known example of overlay networks:
peer- to-peer systems.
– Structured: each node has a well-defined
set of neighbors with whom it can
communicate (tree, ring).
– Unstructured: each node has references
to randomly selected other nodes from the
system.
8
8

Coherent system
• The collection of nodes as a whole operates the same no matter where,
when, and how interaction between a user and the system takes place.
• Examples:
1. An end user cannot tell where a computation is taking place
2. Where data is exactly stored should be irrelevant to an application
3. Whether data has been replicated or not is completely hidden Key:
4. Distribution Transparency
5. Partial Failures
• It’s inevitable that at any time only a part of the distributed system fails. Hiding
partial failures and their recovery is often very difficult and in general impossible to
hide.
9
9

Middleware: OS of Distributed Systems
What’s inside?
• Commonly used components and functions that need not be
implemented by applications separately.
10
10

What do we want to achieve?
 Supporting sharing of resources
 Distribution transparency
 Openness
 Scalability
11
11

Sharing resources
• Canonical examples
– Cloud-based shared storage and files
– Peer-to-peer assisted multimedia streaming
– Shared mail services (think of outsourced mail systems)
– Shared Web hosting (think of content distribution networks)
Observation:
• “The network is the computer”
(quote from John Gage, then at Sun Microsystems)
12
12

Distribution Transparency
13
Transparency Description
Access
Hide differences in data representation and how an object is accessed
Location Hide where an object islocated
Relocation
Hide that an object may be moved to another location while in use
Migration Hide that an object may move (itself) to anotherlocation
Replication Hide that an object isreplicated
Concurrency
Hide that an object may be shared by several
independent users
Failure Hides failure and recovery ofobjects
Note: Distribution transparency is a nice goal, but aiming at fulldistribution transparency may be too much
13

Degree of Transparency
Observation: Aiming at full distribution transparency may be too much.
• There are communication latencies that cannot be hidden
• Completely hiding failures of networks and nodes is (theoretically
and practically) impossible
– You cannot distinguish a slow computer from a failing one
– You can never be sure that a server actually performed an operation
before a crash
• Full transparency will cost performance, exposing distribution of the
system
– Keeping replicas exactly up-to-date with the master takes time
– Immediately flushing write operations to disk for fault tolerance
14
14

Exposing Distribution
• Exposing distribution may be good
– Making use of location-based services (finding your nearby
friends)
– When dealing with users in different time zones
– When it makes it easier for a user to understand what’s going on (
e.g., a server does not respond for a long time, report it as
failing).
• Conclusion:
– Distribution transparency is a nice a goal, but achieving it is a
different story, and it should often not even be aimed at.
15

Openness of Distributed Systems
Open distributed system
Be able to interact with services from other open systems, irrespective
of the underlyingenvironment:
• Systems should conform to well-defined interfaces
• Systems should support portability of applications
• Systems should easily interoperate
Achieving openness
At least make the distributed system independent from heterogeneity of the
underlyingenvironment:
• Hardware
• Platforms
• Languages
16

Policy versus Mechanisms
Implementing openness: Support for different policies:
• What level of consistency do we require for client-cached data?
• Which operations do we allow downloaded code to perform?
• Which QoS requirements do we adjust in the face of varying
bandwidth?
• What level of secrecy do we require for communication?
Implementing openness: Ideally, a distributed system provides only
mechanisms:
• Allow (dynamic) setting of caching policies
• Support different levels of trust for mobile code
• Provide adjustable QoS parameters per data stream
• Offer different encryption algorithms
17

Scale in Distributed Systems
Observation
Many developers of modern distributed systems easily use the adjective
“scalable” without making clear why their system actually scales.
Scalability
At least three components are available:
• Number of users and/or processes (size scalability)
• Maximum distance between nodes (geographical scalability)
• Number of administrative domains (administrative scalability)
Observation
Most systems account only, to a certain extent, for size scalability. But,
today, the challenge lies in geographical and administrative
scalability.
18

Geographical scalability
Cannot simply go from LAN to
WAN:
• Many distributed systems
assume synchronous (the same)
client-server interactions
• Client sends request and
waits for an answer.
• Latency(delay) may easily
prohibit this scheme.
WAN links are often
inherently unreliable
• Simply moving streaming
video from LAN to WAN will
fail.
Lack of multipoint
communication
• A simple search broadcast cannot
be deployed.
• Solution: Develop separate naming
and directory services (having
their own scalability problems).
19

Administrative scalability
Essence: Conflicting policies concerning usage, management, and
security
Exception: Several P2P networks
• File-sharing systems (based,
e.g., on BitTorrent)
• Peer-to-peer telephony
(Skype)
• Peer-assisted audio
streaming (Spotify)
Note: End users collaborate and not administrative entities.
Examples:
Computational grids:
share expensive resources
between different domains.
Shared equipment:
how to control, manage, and use a
shared radio telescope
constructed as large-scale shared
sensor network?
20

Size Scalability
Root causes for scalability problems with centralized
solutions
• Computational capacity, limited by the CPUs
• Storage capacity, including the transfer rate between CPUs
and disks
• Network between the user and the centralized service
21

Size Scaling Technique 1
Hide communication latencies
Avoid waiting for responses; do something else:
Make use of asynchronous
communication
Have separate handler
for incoming response
Problem: not every application fits this
model
22

Offloading Work
• Facilitate solution by moving computations to client
A)
B)
A) a server checking the correctness of field entries
B) a client doing the job
23

Distribution
Partition data and computations across multiple machines:
Move computations to
clients (e.g., Java applets) Decentralized naming services
(e.g., DNS)
Decentralized information systems
(e.g., WWW)
24

Replication/caching
Make copies of data available at different machines:
Replicated file servers and
databases
Mirrored Web sites
Web caches (in browsers
and proxies)
File caching (at server
and client)
25

Replication Problems
Observation
Applying scaling techniques is easy, except for onething:
• Having multiple copies (cached or replicated), leads to
inconsistencies:
• modifying one copy makes that copy different from the rest.
• Always keeping copies consistent and in a general way requires
global synchronization (the same) on each modification.
• Global synchronization precludes large-scale solutions.
Observation
If we can tolerate inconsistencies, we may reduce the need for global
synchronization, but tolerating inconsistencies is application
dependent.
26

Developing Distributed Systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. There are many false assumptions:
• The network is reliable
• The network is secure
• The network is homogeneous
• The topology does not change
• Latency is zero
• Bandwidth is infinite
• Transport cost is zero
• There is one administrator
27

Architectures for Distributed Systems
CHAPTER TWO
29

Main contents
 Architectural Styles
 System Architectures
30

Definitions
Software Architectures:
Describe the organization and interaction of software components; focuses
on logical organization of software (component interaction, etc.)
System Architectures:
Describe the placement of software components on physical machines
The realization of an architecture may be
centralized (most components located on a single machine),
decentralized (most machines have approximately the same functionality), or
hybrid (some combination).
31

Architectural Styles
An architectural style describes a particular way to configure a collection
of components and connectors.
Component: a module with well-defined interfaces; reusable, replaceable
Connector: communication link between modules
Architectures suitable for distributed systems:
 Layered architectures
 Object-based architectures
 Data-centered architectures
 Event-based architectures
32

Figure 2-1. The (a) layered architectural style & (b) The object-based architectural style.
Object based is less structured
component = object
connector = RPC or RMI
33

Data-Centered Architectures
Main purpose: data access and update
Processes interact by reading and modifying data in some shared
repository (active or passive)
Traditional data base (passive): responds to requests
Blackboard system (active): clients solve problems collaboratively;
system updates clients when information changes.
Another example: web-based distributed systems where
communication is through web services (Ch 12 on reference book)
34

Figure 2-2. (a) The event-based architectural style
• Communication via event propagation, in
distributed systems seen often in Publish/ Subscribe;
e.g., register interest in market info; get email updates
• Decouples sender & receiver; asynchronous
communication
Event-based architecture supports several
communication styles:
• Publish-subscribe
• Broadcast
• Point-to-point
35

Figure 2-2. (b) The shared data-space architectural style.
Data Centric Architecture;
e.g., shared distributed file systems or Web-based distributed systems
Combination of data-centered and event based architectures
Processes communicate asynchronously
36

Distribution Transparency
An important characteristic of software architectures in
distributed systems is that they are designed to support
distribution transparency.
Transparency involves trade-offs
Different distributed applications require different
solutions/architectures
There is no “silver bullet” – no one-size-fits-all system.
37

System Architectures for Distributed Systems
Centralized: traditional client-server structure
Vertical (or hierarchical) organization of communication and control paths
(as in layered software architectures)
Logical separation of functions into client (requesting process) and server
(responder)
Decentralized: peer-to-peer
Horizontal rather than hierarchical communication and control
Communication paths are less structured; symmetric functionality
Hybrid: combine elements of C/S and P2P
Edge-server systems and Collaborative distributed systems.
Classification of a system as centralized or decentralized refers to
communication and control organization, primarily. 38

Traditional Client-Server
Processes are divided into two groups (clients and servers).
Synchronous communication: request-reply protocol
In LANs, often implemented with a connectionless protocol
(unreliable)
In WANs, communication is typically connection-oriented TCP/IP
(reliable)
39

C/S Architectures
Figure 2-3. General interaction between a client and a server.
40

Transmission Failures
With connectionless transmissions, failure of any sort means no
reply
Possibilities:
 Request message was lost
 Reply message was lost
 Server failed either before, during or after performing
the service
Can the client tell which of the above errors took place?
41

Layered (software) Architecture for Client-Server Systems
User-interface level: GUI’s (usually) for interacting with end users
Processing level: data processing applications – the core functionality
Data level: interacts with data base or file system
Data usually is persistent; exists even if no client is accessing its File
or database system
42

Examples
Web search engine
Interface: type in a keyword string
Processing level: processes to generate DB queries, rank replies, format response
Data level: database of web pages
Stock broker’s decision support system
Interface: likely more complex than simple search
Processing: programs to analyze data; rely on statistics, AI perhaps, may require large
simulations
Data level: DB of financial information
Desktop “office suites”
Interface: access to various documents, data,
Processing: word processing, database queries, spreadsheets,…
Data : file systems and/or databases
43

Application Layering
Figure 2-4. The simplified organization of an Internet search engine into three different layers.
44

System Architecture
Mapping the software architecture to system hardware
Correspondence between logical software modules and actual computers
Multi-tiered architectures
 Layer and tier are roughly equivalent terms, but layer typically implies software
and tier is more likely to refer to hardware.
 Two-tier and three-tier are the most common Multi-tiered architectures
45

Two-tiered C/S Architectures
 Server provides processing and data management;
 client provides simple graphical display (thin-client)
 Perceived performance loss at client
 Easier to manage, more reliable, client machines don’t need to be so
large and powerful
At the other extreme, all application processing and some data resides at
the client (fat-client approach)
Pro: reduces work load at server; more scalable
Con: harder to manage by system admin, less secure
46

Multitiered Architectures
Thin
Client
Fat
Client
Figure 2-5. Alternative client-server organizations (a)–(e).
47

Three-tiered Architectures
In some applications servers may also need to be clients, leading to a three
level architecture
 Distributed transaction processing
 Web servers that interact with database servers
Distribute functionality across three levels of machines instead of two.
48

Multitiered Architectures (3 Tier Architecture)
Figure 2-6. An example of a server acting as client.
49

Centralized v Decentralized Architectures
Traditional client-server architectures exhibit vertical distribution. Each
level serves a different purpose in the system.
Logically different components reside on different nodes
Horizontal distribution (P2P): each node has roughly the same processing
capabilities and stores/manages part of the total system data.
Better load balancing, more resistant to denial-of-service attacks,
harder to manage than C/S Communication & control is not hierarchical;
all about equal
50

CHAPTER THREE
PROCESS
2/9/2024 52

3 Principles of Processes in Distributed Systems
 Communication takes place between processes and a process is a
program in execution
 From OS perspective, management and scheduling of processes is
important.
 Other important issues arise in distributed systems include:
 Multi threading is used to enhance performance by overlapping
communication and local processing.
 How are clients and servers organized and server design issues?
 Process or code migration to achieve scalability and to dynamically
configure clients and servers
2/9/2024 53

3.1 Threads and their Implementation
What a process is and how are processes and threads related?
Process tables or PCBs(Process control block) are used to keep track of
processes. There are usually many processes executing concurrently
• Processes should not interfere with each other.
• Sharing resources by processes is transparent.
This concurrency transparency has a high price;
Allocating resources for a new process and context switching take time.
A thread also executes independently from other threads;
but no need of a high degree of concurrency transparency thereby
resulting in better performance
2/9/2024 54

Threads can be used in both distributed and non distributed systems
Threads in Non distributed Systems
A process has an address space (containing program text and data) and a
single thread of control, as well as other resources such as open files,
child processes, accounting information, etc.
Process 1 Process 2 Process 3
Processes each with one thread One process with three threads
2/9/2024 55

Each thread has its own program counter, registers, stack, and state;
but all threads of a process share address space, global variables and
other resources such as open files, etc.
2/9/2024 56

Threads take turns in running
Threads allow multiple executions to take place in the same process
environment, called multi threading
1) Simplifying the programming model: since many activities are
going on at once more or less independently
2) They are easier to create and destroy than processes since
they do not have any resources attached to them
3) Performance improves by overlapping activities if there is too
much I/ O; i. e., to avoid blocking when waiting for input or
doing calculations, say in a spreadsheet
4) Real parallelism is possible in a multiprocessor system
2/9/2024 57

In non distributed systems, threads can be used with shared data
instead of processes to avoid context switching overhead inter-process
communication (IPC)
context switching as the result of IPC
2/9/2024 58

Thread Implementation
Threads are usually provided in the form of a thread package. The package contains
operations to create and destroy a thread, operations on synchronization variables such as
mutexes and condition variables
Two approaches of constructing a thread package
a) Construct a thread library that is executed entirely in user mode (the OS is not
aware of threads) cheap to create and destroy threads;
Just allocate and free memory context switching can be done using few instructions;
store and reload only CPU register values
Disadvantage: invocation of a blocking system call will block the entire process to
which the thread belongs and all other threads in that process
b) Implement them in the OS' kernel
Let the kernel be aware of threads and schedule them expensive for thread operations
such as creation and deletion since each requires a system call
2/9/2024 59

Threads in Distributed Systems
Multithreaded Clients
 Consider a web browser; fetching different parts of a page can be implemented
as a separate thread,
 Each opening its own TCP connection to the server
 Each can display the results as it gets its part of the page
 Can also be achieved for replicated servers since each thread request can be
forwarded to separate replicas
Multithreaded Servers
Servers can be constructed in three ways
A. single-threaded process
 It gets a request
 Examines it
 Carries it out to completion before getting the next request
2/9/2024 60

B. Threads
Threads are more important for implementing servers
e.g., a file server
The dispatcher thread reads incoming requests for a file operation from
clients and passes it to an idle worker thread
C. Finite-state machine
If threads are not available, it gets a request, examines it, tries to fulfill the
request from cache, else sends a request to the file system.
The models or approaches can be summarized as follow
Model Characteristics
Single-threaded process No parallelism, blocking system calls
Threads Parallelism, blocking system calls (thread only)
Finite-state machine Parallelism, non blocking system calls
2/9/2024 61

Virtualization
The separation between having a single CPU and being able to pretend and
extended to other resources as well, leading to what is known as resource
virtualization.
The Role of Virtualization in Distributed Systems
Virtualization deals with extending or replacing an existing interface so as to
mimic the behavior of another system.
Allow legacy software (various applications, operating systems they were
developed for) to run on expensive mainframe hardware.
2/9/2024 62

Role of Virtualization
2/9/2024 63
General organization between a program,
interface, and system.
General organization of virtualizing system A
on top of system B.

Architectures of Virtual Machines
To understand the differences in virtualization, it is important to realize that
computer systems generally offer four different types of interfaces, at four
different levels:
1) An interface between the hardware and software, consisting of machine
instructions that can be invoked by any program.
2) An interface between the hardware and software, consisting of machine
instructions that can be invoked only by privileged programs, such as an
operating system.
3) An interface consisting of system calls as offered by an operating system.
4) An interface consisting of library calls, generally forming what is known
as an application programming interface (API). In many cases, the
aforementioned system calls are hidden by an API.
2/9/2024 64

2/9/2024 65
Various interfaces offered by computer systems.
Virtualization can takes place in to two different ways:
1. A process VM: we can build a runtime system that essentially provides an abstract
instruction set, that is to be used for executing applications.
Eg. Java virtual machine
2. System VM: is to provide a system that is essentially implemented as a layer completely
shielding the original hardware, but offering the complete instruction set of that same (or
other hardware) as an interface.
• Virtual Machine Monitor Eg. VMWare

A process VM, with multiple instances of
(application, runtime) combinations
2/9/2024 66
A VMM. with multiple instances of
(applications, OS) combinations.

Anatomy of Clients and Servers
Two issues: user interfaces and client-side software for distribution transparency
A. Networked User Interfaces
To create a convenient environment for the interaction of a human user and a
remote server;
e.g. mobile phones with simple displays and a set of keys
GUIs are most commonly used
The X Window System (or simply X) as an example
It has the X kernel: the part of the OS that controls the terminal (monitor,
keyboard, pointing device like a mouse) and is hardware dependent
2/9/2024 67

b. Client-Side Software for Distribution Transparency
In addition to the user interface, parts of the processing and data level
in a client-server application are executed at the client side.
An example is embedded client software for ATMs, cash registers, etc.
moreover, client software can also include components to achieve
distribution transparency
e.g., replication transparency by means of client side solutions
2/9/2024 68

Servers and Design Issues
1 General Design Issues
A server is a process implementing a specific service on behalf of a collection of clients.
a. How to organize servers?
Iterative server
The server itself handles the request and returns the result
Concurrent server
It passes a request to a separate process or thread and waits for the next
incoming request; e.g., a multithreaded server;
2/9/2024 69

b. Where do clients contact a server?
Using endpoints or ports at the machine where the server is running and
each server listens to a specific endpoint
How do clients know the endpoint of a service?
Globally assign endpoints for well-known services;
e.g. FTP is on TCP port 21, HTTP is on TCP port 80
To have running daemon, which keeps track of the current end points of
each service implemented by a co-located server.
2/9/2024 70

How Servers can be interrupted?
Abruptly(suddenly) exit client application
Send out-of-band data
Whether or not the server is stateless:
 A stateless server does not keep information on the state of its clients,
and can change its own state without having to inform any client
 A particular form of a stateless design is where the server maintains
what is known as soft state, servers maintain soft state (keeps
information for limited period of time).
 Statefull server generally maintain persistent information on its clients.
2/9/2024 71

Code Migration
So far, communication was concerned on passing data.
We may pass programs, even while running and in heterogeneous systems.
Code migration also involves moving data as well:
When a program migrates while running, its status, pending signals,
and other environment variables such as the stack and the program
counter also have to be moved
2/9/2024 72

Reasons for Migrating Code
To improve performance: move processes from heavily-loaded to
lightly-loaded machines (load balancing)
To reduce communication: move a client application that performs
many database operations to a server if the database resides on the
server; then send only results to the client
To exploit parallelism (for nonparallel programs): e.g., copies of a
mobile program (called a mobile agent is called in search engines)
moving from site to site searching the Web
2/9/2024 73

Models for Code Migration
Code migration doesn’t only mean moving code; in some cases, it also means moving
the execution status of a program, pending signals, and other parts of the execution
environment
a process consists of three segments: code segment (set of instructions), resource
segment (references to external resources such as files, printers, ...), and execution
segment (to store the current execution state of a process such as private data, the
stack, the program counter)
alternatives for code migration
 weak versus strong mobility
 is it sender-or receiver-initiated
 is it executed at the target process or in a separate process (for weak mobility);
migrate or clone process (for strong mobility)
2/9/2024 74

Weak Mobility
Transfer only the code segment and may be some initialization data; in
this case a program always starts from its initial stage, e.g. Java Applets
Execution can be by
The target process (in its own address space like in Java Applets) but the target
process and local resources must be protected (security) or
by a separate process; still local resources must be protected (security)
2/9/2024 75

Strong Mobility (or process migration )
Transfer code and execution segments; helps to migrate a process in execution;
stop execution, move it, and then resume execution from where it is stopped
Migration can be
Sender-initiated: the machine where the code resides or is currently running;
e.g., uploading programs to a server; may need authentication or that the client is
a registered one; crawlers to index Web pages
Receiver-initiated: by the target machine; e.g., Java Applets; easier to implement
In a client-server model, receiver-initiated is easier to implement since security
issues are minimized;
If clients are allowed to send code (sender-initiated), the server must know them
since they may access resources such as disk on the server
2/9/2024 76

Summary of models of code migration
2/9/2024 77

Resource-to-Machine Bindings
With the migrate Unattached Resources: can be easily moving
program (such as data files associated with the program)
Fastened Resources: such as local databases and complete Web sites;
moving or copying may be possible, but very costly
Fixed Resources: intimately bound to a specific machine or
environment such as local devices and cannot be moved.
2/9/2024 78

Types of Process-to-Resource Bindings
Binding by identifier (the strongest): a resource is referred by its
identifier; the process requires that resource; e.g., a URL to refer to a Web
page or an FTP server referred by its Internet (IP) address
Binding by value (weaker): when only the value of a resource is needed;
in this case another resource can provide the same value; e.g., standard
libraries of programming languages such as C or Java which are normally
locally available, but their location in the file system may vary from site to site
Binding by type (weakest): a process needs a resource of a specific type;
reference to local devices, such as monitors, printer
2/9/2024 79

CHAPTER FOUR
Communication
2/9/2024 81

Objectives of the Chapter
Review of how processes communicate in a network (the rules or the protocols) and
their structures
Introduce most widely used communication models for distributed systems:
 Network Protocols and Standards
 Remote Procedure Call (RPC) -which hides the details of message passing and suitable
for client-server models
 Remote Object (Method) Invocation (RMI)
 Message-Oriented Middleware (MOM) -instead of the client-server model, think in
terms of messages and have a high level message queuing model similar to e-mail
 Stream-Oriented Communication -for multimedia to support the continuous flow of
messages with timing constraints
 Multicast Communication -information dissemination for several recipients.
 Web services - offering general services to remote applications without immediate
interactions from end users.
2/9/2024 82

Network Protocols and Standards
Why communication in distributed systems?
Because there is no shared memory
Two communicating processes must agree on the syntax and semantics of
messages
A protocol is a set of rules that governs data communications
A protocol defines what is communicated, how it is communicated, and
when it is communicated
The key elements of a protocol are syntax, semantics, and timing
Syntax: refers to the structure or format of the data
Semantics: refers to the meaning of each section of bits
Timing: refers to when data should be sent and how fast they can be sent
2/9/2024 83

Two computers, possibly from different manufacturers, must be able to talk to
each other
For such a communication, there has to be a standard
The ISO OSI (Open Systems Interconnection) Reference Model is one of such
standards (7 layers )
TCP/IP protocol suite is the other; has 4 or 5 layers
OSI
Open – to connect open systems or systems that are open for
communication with other open systems using standard rules that
govern the format, contents, and meaning of the messages sent and
received
These rules are called protocols
Two types of transport layer protocols: connection-oriented and
connectionless
2/9/2024 84

Lower-Level Protocols
The three lowest layers of the OSI protocol suite. Together, these layers
implement the basic functions that encompass a computer network.
 The physical layer
 The data link layer
 The Network layer
2/9/2024 85

a typical message as it appears on the network
2/9/2024 86

Normal operation of TCP
Assuming no messages are lost,
 The client initiates a setup connection using a
three-way handshake (1-3)
 The client sends its request (4)
 It then sends a message to close the
connection (5)
 The server acknowledges receipt and informs
the client that the connection will be closed
down (6)
 Then sends the answer (7) followed by a
request to close the connection (8)
 The client responds with an ack to finish
conversation (9)
Transport Protocols: Client-Server TCP
2/9/2024 87

transactional TCP
 Much of the overhead in TCP is for managing the connection
 The client sends a single message consisting of a
setup request, service request, and information to
the server that the connection will be closed down
immediately after receiving the answer (1)
 The server sends acceptance of connection request,
the answer, and a connection release (2)
 The client acknowledges tear down of the connection
(3)
 Combine connection setup with request and closing
connection with answer
 Such protocol is called TCP for Transactions
(T/TCP)
2/9/2024 88

Higher- Level Protocols
Above the transport layer, OSI distinguished three additional layers. In
practice, only the application layer is ever used. In fact, in the Internet
protocol suite, everything above the transport layer is grouped
together.
The session layer is essentially an enhanced version of the transport
layer. It provides dialog control, to keep track of which party is
currently talking, and it provides synchronization facilities.
2/9/2024 89

Higher- Level Protocols (cont…)
Unlike the lower layers, which are concerned with getting the bits from
the sender to the receiver reliably and efficiently, the presentation
layer is concerned with the meaning of the bits.
Most messages do not consist of random bit strings, but more structured
information such as people's names, addresses, amounts of money, and
so on.
2/9/2024 90

Application Protocols
 File transfer (FTP - File Transfer Protocol)
 HTTP - Hypertext Transfer Protocol for accessing data on the WWW
Middleware Protocols
A middleware is an application that contains general-purpose protocols to
provide services
Example of middleware services
 Authentication and authorization services
 Distributed transactions (commit protocols; locking mechanisms)
 Middleware communication protocols (calling a procedure or invoking an object
remotely, synchronizing streams for real-time data, multicast services)
Hence an adapted reference model for networked communications is required
2/9/2024 91

An adapted reference model for networked communication
2/9/2024 92

Remote Procedure Call
The first distributed systems were based on explicit message exchange
between processes through the use of explicit send and receive procedures; but
do not allow access transparency
In 1984, Birrel and Nelson introduced a different way of handling
communication: RPC
It allows a program to call a procedure located on another machine
Simple and elegant, but there are implementation problems
The calling and called procedures run in different address spaces
Parameters and results have to be exchanged;
What if the machines are not identical?
What happens if both machines crash?
2/9/2024 93

parameter passing in a local procedure call: the
stack before the call to read
Conventional Procedure Call, i.e., on a single machine
e.g. count = read (fd, buf, bytes); a C like statement, where
fd is an integer indicating a file
buf is an array of characters into which data are read
bytes is the number of bytes to be read
the stack while the called procedure is active
Stack pointer
 Parameters can be call-by-value (fd and bytes) or call-by reference (buf)
or in some languages call-by-copy/restore
Stack pointer
2/9/2024 94

principle of RPC between a client and server program
Client and Server Stubs
RPC would like to make a remote procedure call look the same as a local
one; it should be transparent, i.e., the calling procedure should not know
that the called procedure is executing on a different machine or vice versa
When a program is compiled, it uses different versions of library functions called
client stubs and a server stub is the server-side equivalent of a client stub
2/9/2024 95

Steps of a Remote Procedure Call
1. Client procedure calls client stub in the normal way
2. Client stub builds a message and calls the local OS (packing parameters
into a message is called parameter marshaling)
3. Client's OS sends the message to the remote OS
4. Remote OS gives the message to the server stub
5. Server stub unpacks the parameters and calls the server
6. Server does the work and returns the result to the stub
7. Server stub packs it in a message and calls the local OS
8. Server's OS sends the message to the client's OS
9. Client's OS gives the message to the client stub
10. Stub unpacks the result and returns to client
 Hence, for the client remote services are accessed by making ordinary (local) procedure
calls; not by calling send and receive
2/9/2024 96

Steps involved in doing remote computation through RPC
Parameter Passing
1. Passing Value Parameters
e.g., consider a remote procedure add(i, j), where i and j are integer parameters
2/9/2024 97

2. Passing Reference Parameters
 Assume the parameter is a pointer to an array
 Copy the array into the message and send it to the server
 The server stub can then call the server with a pointer to this array
 The server then makes any changes to the array and sends it back to the
client stub which copies it to the client
 This is in effect call-by-copy/restore
Optimization of the method
 One of the copy operations can be eliminated if the stub knows
whether the parameter is input or output to the server
 If it is an input to the server (e.g., in a call to write), it need not be
copied back
 If it is an output, it need not be sent over in the first place; only send
the size
 The above procedure can handle pointers to simple arrays and
structures, but difficult to generalize it to an arbitrary data structure
2/9/2024 98

Asynchronous RPC
 If there is no need to block the client until it gets a reply (two cases)
1. If there is no result to be returned
e.g., adding entries in a database, ...
The server immediately sends an ack promising that it will carryout
the request
The client can now proceed without blocking
a) The interconnection between client and server in a traditional RPC
b) The interaction using asynchronous RPC
2/9/2024 99

2. If the result can be collected later
 e.g., prefetching network addresses of a set of hosts, ...
 The server immediately sends an ack promising that it will carryout the
request
 The client can now proceed without blocking
 The server later sends the result
a client and server interacting through two asynchronous RPCs
2/9/2024 100

 The above method combines two asynchronous RPCs and is
sometimes called deferred synchronous RPC
 variants of asynchronous RPC
 let the client continue without waiting even for an ack, called one-
way RPC
 problem: if reliability of communication is not guaranteed
2/9/2024 101

RPC Programming Process
Dividing the program into local and remote procedures.
Proc A
Client Stub
Server Stub
Proc B
RPC
2/9/2024 102

RPC Dispatching (Procedure Location)
Proc A1
Client Stub
Server Stub
Proc B1
Proc A2
Client Stub
Proc B2
Server Stub
Dispatcher
RPC
RPC
2/9/2024 103

RPC Interface Specification
Proc A
Client comm
Server Iface
Proc B
Server Comm
Client Iface
RPC
2/9/2024 104

RPC General Build Procedure
Develop Interface
Develop Client Develop Server
2/9/2024 105

Resulted from object-based technology that has proven its value in developing non
distributed applications and it is an expansion of the RPC mechanisms
It enhances distribution transparency as a consequence of an object that hides its
internal from the outside world by means of a well-defined interface
Distributed Objects: An object encapsulates data, called the state, and the
operations on those data, called methods
 Methods are made available through interfaces
 The state of an object can be manipulated only by invoking methods
This allows an interface to be placed on one machine while the object itself
resides on another machine; such an organization is referred to as a
distributed object
Remote Object (Method) Invocation (RMI)
2/9/2024 106

 The state of an object is not distributed, only the interfaces are; such
objects are also referred to as remote objects
 The implementation of an object’s interface is called a proxy
(analogous to a client stub in RPC systems)
 It is loaded into the client’s address space when a client binds to a
distributed object
 Tasks: a proxy marshals method invocation into messages and
unmarshals reply messages to return the result of the method
invocation to the client
 A server stub, called a skeleton, unmarshals messages and marshals
replies
2/9/2024 107

common organization of a remote object with client-side proxy
2/9/2024 108

Binding a Client to an Object
A process must first bind to an object before invoking its methods, which
results in a proxy being placed in the process’s address space
Binding can be implicit (directly invoke methods using only a reference to an
object) or explicit (by calling a special function)
An object reference could contain
 Network address of the machine where the object resides
 Endpoint of the server
 An identification of which object
 The protocol used ...
2/9/2024 109

Parameter Passing
 There are two situations when invoking a method with object
reference as a parameter: the object can be local or remote to the
client
 Local object: a copy of the object is passed; this means the object is
passed by value
 Remote object: copy and pass the reference of the object as a value
parameter; this means the object is passed by reference
2/9/2024 110

the situation when passing an object by reference or by value
 two examples:
 DCE Remote Objects
 Java RMI
Read R1: pages 93-98
2/9/2024 111

until now, we focused on exchanging independent and complete units of information
time has no effect on correctness; a system can be slow or fast however,
there are communications where time has a critical role
Multimedia
Media: storage, transmission, interchange, presentation, representation and
perception of different data types:
text, graphics, images, voice, audio, video, animation, ...
Movie: video + audio + …
Multimedia: handling of a variety of representation media end user pull
information overload and starvation technology push
emerging technology to integrate media
Stream Oriented Communication
2/9/2024 121

The Challenge
New applications
 multimedia will be pervasive in few years (as graphics)
Storage and transmission
 e.g., 2 hours uncompressed HDTV (1920×1080) movie: 1.12 TB
(1920×1080x3x25x60x60x2)
 videos are extremely large, even after compressed (actually encoded)
Continuous delivery
 e.g., 30 frames/s (NTSC), 25 frames/s (PAL) for video
 guaranteed Quality of Service
 admission control
Search
 can we look at 100… videos to find the proper one?
2/9/2024 122

Types of Media
Two types
 Discrete media: text, executable code, graphics, images; temporal
relationships between data items are not fundamental to correctly
interpret the data
 Continuous media: video, audio, animation; temporal relationships between
data items are fundamental to correctly interpret the data
A data stream is a sequence of data units and can be applied to discrete as well
as continuous media
Stream-oriented communication provides facilities for the exchange of time-
dependent information (continuous media) such as audio and video streams
2/9/2024 123

Timing in transmission modes
asynchronous transmission mode: data items are transmitted one after the
other, but no timing constraints; e.g. text transfer
synchronous transmission mode: a maximum end-to-end delay defined for each
data unit; it is possible that data can be transmitted faster than the
maximum delay, but not slower
isochronous transmission mode: maximum and minimum end-to-end delay are
defined; also called bounded delay jitter; applicable for distributed
multimedia systems
a continuous data stream can be simple or complex
simple stream: consists of a single sequence of data; e.g., mono audio, video only
(only visual frames)
complex stream: consists of several related simple streams that must be
synchronized; e.g., stereo audio, video consisting of audio and video (may also
contain subtitles, translation to other languages, ...)
2/9/2024 124

Unicast, Broadcast versus Multicast
Unicast
One-to-one
Destination – unique receiver
host address
Broadcast
One-to-all
Destination – address of
network
Multicast
One-to-many
Multicast group must be
identified
Destination – address of group
Key:
Unicast transfer
Broadcast transfer
Multicast transfer
2/9/2024 12
5

Multicast application examples
Financial services
Delivery of news, stock quotes, financial indices, etc
Remote conferencing/e-learning
Streaming audio and video to many participants (clients,
students)
Interactive communication between participants
Data distribution
e.g., distribute experimental data from Large Hadron Collider
(LHC) at CERN lab to interested physicists around the world
2/9/2024 126

2/9/2024
Introduction to Web Services
Microsoft coined the term “Web services” in June 2000, when the company
introduced Web services as a key component of its .Net initiative,
A new vision for embracing the Internet in the development, engineering and use
of software.
As others began to investigate Web services, it became clear that the technology
could revolutionise distributed computing.
Now, nearly every major vendor is marketing Web services’ tools and applications
and Web services are radically changing IT architectures and partner
relationships.
127

2/9/2024
Web services encompass a set of related standards that can allow any two
computers to communicate and exchange data via a network, such as the
Internet.
The primary standard used in Web services is the Extensible Markup Language
(XML) developed by the World Wide Web Consortium (W3C).
Developers use XML tags to describe individual pieces of data, forming XML
documents, which are text-based and can be processed on any platform.
128

2/9/2024
XML’s portability and its rapid adoption throughout the industry made it
the obvious choice for enabling cross-platform data communication in
Web services.
XML provides the foundation for many core Web services standards:
1. SOAP,
2. WSDL,
3. UDDI,
Plus vocabularies of XML-based markup for a specific industry or purpose.
Almost every type of business can benefit from Web services such as:
 Expediting software development,
 Integrating applications and databases,
 Automating transactions with suppliers, partners, and clients.
129

2/9/2024
 SOAP (was originally called the Simple Object Access Protocol) is an XML
vocabulary that lets programs on separate computers to interact across a
network (via RPC).
 WSDL (Web Services Description Language) is another XML vocabulary that
lets developers describe Web services and their capabilities in a standardised
format.
 UDDI (Universal Description, Discovery and Integration) is a framework that
defines XML-based registries where businesses can publish information about
themselves and the services they offer.
130

2/9/2024
Web Services’ Applications
Unfortunately, interoperability, the ability to communicate and share data with
software from different vendors and platforms, is limited among conventional
proprietary technologies, e.g. DCE, CORBA, DCOM and RMI.
Web services improve distributed computing interoperability by using open (non-
proprietary) standards that can enable (theoretically) any two software
components to communicate:
Also they are easier to debug because they are text-based, rather than
binary, communication protocols.
131

2/9/2024
The Advantages of Web Services
Web services advantages:
Use open, text-based standards, which allow components written in different
languages and for different platforms to communicate,
Promotes a modular approach to programming, so multiple organisations can
communicate with the same Web services.
Comparatively easy and inexpensive to implement, because they employ an
existing infrastructure and because most applications can be repackaged as
Web services,
Significantly reduce the costs of enterprise application integration (EAI) and
B2B communications,
Implemented incrementally, rather than all at once which lessens the cost and
reduces the organisational disruption from an abrupt switch in technologies,
The Web Services Interoperability Organisation (WS-I) consisting of over
100 vendors promotes interoperability. 132

2/9/2024
Web Services’ challenges
Web services’ challenges:
The standards that drive Web services are still in draft form, always will be in
refinement.
Some vendors want to retain their intellectual property rights to certain Web
services standards.
Web services need standard security procedures, a common problem to all of
distributed computing.
The leading registry, based on the UDDI specification, has some key
limitations, and alternative discovery methods are provided by ebXML and WS-
Inspection.
Web services need Quality of Service (QoS) support from Web Services
Registries, Brokerages, and Network Providers.
133

2/9/2024
Web Services Basics
Web services:
Software programs that use XML to exchange information with other
software via common Internet protocols:
Scalable, e.g. multiplying two numbers together to an entire customer-
relationship management system,
 Programmable - encapsulates a task,
 Based on XML - open, text-based standard,
 Self-describing - metadata for access and use,
 Discoverable - search and locate in registries,.
134

Architecture of Web Service
A web service is a network accessible interface to application functionality,
built using standard Internet technologies.
Clients of web services do NOT need to know how it is implemented.
Application
client
Application
code
Network Web
Service
2/9/2024 135

2/9/2024
Web Services
1. Client queries registry to locate
service.
2. Registry refers client to WSDL
document.
3. Client accesses WSDL document.
4. WSDL provides data to interact with
Web service.
5. Client sends SOAP-message request.
6. Web service returns SOAP-message
response.
WSDL
Document
UDDI
Registry
Web
Services
Client
1
2
3
4
5
6
136

Web Service Technology Stack
Discovery
Description
Packaging
Transport
Network
shopping web service?
WSDL URIs
Web Service
Client
Web Service
UDDI
Proxy
WSDL
SOAP pkg
request
WSD
L
SOAP pkg
response
2/9/2024 137

Step1. Write Web Service Method
Discovery
Description
Packaging
Transport
Network
WSDL URIs
Web Service
Client
Web Service
UDDI
Proxy
WSDL
SOAP pkg
request
WSD
L
SOAP pkg
response
2/9/2024 138

Step2. Describe Web Service using WSDL
Discovery
Description
Packaging
Transport
Network
WSDL URIs
Web Service
Client
Web Service
UDDI
Proxy
WSDL
SOAP pkg
request
WSD
L
SOAP pkg
response
2/9/2024 139

Step3. Write Proxy to Access Web Service
Discovery
Description
Packaging
Transport
Network
WSDL URIs
Web Service
Client
Web Service
UDDI
Proxy
WSDL
SOAP pkg
request
WSD
L
SOAP pkg
response
2/9/2024 140

Step4. Write Client to Invoke Proxy
Discovery
Description
Packaging
Transport
Network
WSDL URIs
Web Service
Client
Web Service
UDDI
Proxy
WSDL
SOAP pkg
request
WSD
L
SOAP pkg
response
2/9/2024 141

SOAP client SOAP server
SOAP (Simple Object Access Protocol)
SOAP Messages Using SOAP as RPC (Remote Procedure Call) Messages
Request message
Response message
* Read about Distributed Objects and Components
2/9/2024 142

Chapter-1-IntroDistributeddffsfdfsdf-1.pptx

Recommended

Recommended

More Related Content

Similar to Chapter-1-IntroDistributeddffsfdfsdf-1.pptx

Similar to Chapter-1-IntroDistributeddffsfdfsdf-1.pptx (20)

More from meharikiros2

More from meharikiros2 (11)

Recently uploaded

Recently uploaded (20)

Chapter-1-IntroDistributeddffsfdfsdf-1.pptx