Interprocess Communication: The API for the Internet Protocols – External data representation and marshalling – Client– server communication – Group communication. Distributed Objects – Communication between distributed objects – Remote procedure call.
2. Unit II
Interprocess Communication: The API for the Internet
Protocols – External data representation and marshalling –
Client– server communication – Group communication.
Distributed Objects – Communication between distributed
objects – Remote procedure call.
3. Interprocess Communication
● Interprocess communication in the Internet provides both datagram and stream
communication. The Java APIs for these are presented, together with a
discussion of their failure models.
● They provide alternative building blocks for communication protocols.
● This is complemented by a study of protocols for the representation of collections
of data objects in messages and of references to remote objects
4. The API for the Internet protocols
● In this section, we discuss the general characteristics of interprocess
communication and then discuss the Internet protocols as an example, explaining
how programmers can use them, either by means of UDP messages or through TCP
streams
1. Characteristics of interprocess communication
● Message passing between a pair of processes can be supported by two message
communication operations, send and receive, defined in terms of destinations and
messages.
● To communicate, one process sends a message (a sequence of bytes) to a
destination and another process at the destination receives the message.
● This activity involves the communication of data from the sending process to the
receiving process and may involve the synchronization of the two processes
5. (i)Synchronous and asynchronous communication
● A queue is associated with each message destination. Sending processes cause messages
to be added to remote queues and receiving processes remove messages from local
queues.
● Communication between the sending and receiving processes may be either synchronous
or asynchronous.
● In the synchronous form of communication, the sending and receiving processes
synchronize at every message. In this case, both send and receive are blocking
operations. Whenever a send is issued the sending process (or thread) is blocked until
the corresponding receive is issued. Whenever a receive is issued by a process (or
thread), it blocks until a message arrives.
● In the asynchronous form of communication, the use of the send operation is
nonblocking in that the sending process is allowed to proceed as soon as the message has
been copied to a local buffer, and the transmission of the message proceeds in parallel
with the sending process.
● The receive operation can have blocking and non-blocking variants. In the non-blocking
variant, the receiving process proceeds with its program after issuing a receive
operation, which provides a buffer to be filled in the background, but it must separately
receive notification that its buffer has been filled, by polling or interrupt.
6. ● In a system environment such as Java, which supports multiple threads in
a single process, the blocking receive has no disadvantages, for it can be
issued by one thread while other threads in the process remain active,
and the simplicity of synchronizing the receiving threads with the
incoming message is a substantial advantage.
● Non-blocking communication appears to be more efficient, but it involves
extra complexity in the receiving process associated with the need to
acquire the incoming message out of its flow of control.
● For these reasons, today’s systems do not generally provide the
nonblocking form of receive.
7. (ii)Message destinations
● A local port is a message destination within a computer, specified as an integer.
● A port has exactly one receiver but can have many senders. Processes may use multiple
ports to receive messages.
● Any process that knows the number of a port can send a message to it. Servers
generally publicize their port numbers for use by clients
● If the client uses a fixed Internet address to refer to a service, then that service
must always run on the same computer for its address to remain valid.
● This can be avoided by using the following approach to providing location transparency:
Client programs refer to services by name and use a name server or binder to
translate their names into server locations at runtime. This allows services to be
relocated but not to migrate – that is, to be moved while the system is running.
8. (iii)Reliability
● It defines reliable communication in terms of validity and integrity.
● As far as the validity property is concerned, a point-to-point message
service can be described as reliable if messages are guaranteed to be
delivered despite a ‘reasonable’ number of packets being dropped or lost.
● In contrast, a point-to-point message service can be described as
unreliable if messages are not guaranteed to be delivered in the face of
even a single packet dropped or lost.
● For integrity, messages must arrive uncorrupted and without duplication.
9. (iv)Ordering
● Some applications require that messages be delivered in sender
order – that is, the order in which they were transmitted by the
sender.
● The delivery of messages out of sender order is regarded as a
failure by such applications.
10. 2.Sockets
Both forms of communication (UDP and TCP) use the socket abstraction, which
provides an endpoint for communication between processes.
Interprocess communication consists of transmitting a message between a socket
in one process and a socket in another process, as illustrated in Figure 4.2
11. ● For a process to receive messages, its socket must be bound to a local
port and one of the Internet addresses of the computer on which it runs.
● Messages sent to a particular Internet address and port number can be
received only by a process whose socket is associated with that Internet
address and port number.
● Processes may use the same socket for sending and receiving messages
● Each computer has a large number (2 pow 16) of possible port numbers
for use by local processes for receiving messages.
● Any process may make use of multiple ports to receive messages, but a
process cannot share ports with other processes on the same computer.
● However, any number of processes may send messages to the same port.
Each socket is associated with a particular protocol – either UDP or TCP.
12. Java API for Internet addresses:
● As the IP packets underlying UDP and TCP are sent to Internet addresses,
Java provides a class, InetAddress, that represents Internet addresses.
● Users of this class refer to computers by Domain Name System (DNS)
hostnames. The method uses the DNS to get the corresponding Internet
address.
● For example, to get an object representing the Internet address of the host
whose DNS name is bruno.dcs.qmul.ac.uk, use:
InetAddress aComputer = InetAddress.getByName("bruno.dcs.qmul.ac.uk");
13. 3.UDP datagram communication
● A datagram sent by UDP is transmitted from a sending process to a receiving
process without acknowledgement or retries.
● If a failure occurs, the message may not arrive. A datagram is transmitted
between processes when one process sends it and another receives it.
● To send or receive messages a process must first create a socket bound to an
Internet address of the local host and a local port.
● A server will bind its socket to a server port – one that it makes known to
clients so that they can send messages to it.
● A client binds its socket to any free local port.
● The receive method returns the Internet address and port of the sender, in
addition to the message, allowing the recipient to send a reply.
14. The following are some issues relating to datagram communication
(i)size:
● The receiving process needs to specify an array of bytes of a particular size in
which to receive a message.
● If the message is too big for the array, it is truncated on arrival.
● The underlying IP protocol allows packet lengths of up to 2 pow 16 bytes, which
includes the headers as well as the message.
● However, most environments impose a size restriction of 8 kilobytes. Any
application requiring messages larger than the maximum must fragment them into
chunks of that size
15. (ii)Blocking:
● Sockets normally provide non-blocking sends and blocking receives for datagram
communication (a non-blocking receive is an option in some implementations).
● The send operation returns when it has handed the message to the underlying UDP and
IP protocols, which are responsible for transmitting it to its destination. On arrival, the
message is placed in a queue for the socket that is bound to the destination port.
● The message can be collected from the queue by an outstanding or future invocation of
receive on that socket. Messages are discarded at the destination if no process already
has a socket bound to the destination port.
● The method receive blocks until a datagram is received, unless a timeout has been set
on the socket. If the process that invokes the receive method has other work to do
while waiting for the message, it should arrange to use a separate thread.
16. (ii)Timeouts:
● The receive that blocks forever is suitable for use by a server that is waiting
to receive requests from its clients.
● But in some programs, it is not appropriate that a process that has invoked a
receive operation should wait indefinitely in situations where the sending
process may have crashed or the expected message may have been lost.
● To allow for such requirements, timeouts can be set on sockets.
● Choosing an appropriate timeout interval is difficult, but it should be fairly
large in comparison with the time required to transmit a message.
17. (ii)Receive from any:
● The receive method does not specify an origin for messages.
● Instead, an invocation of receive gets a message addressed to its socket from any
origin
● The receive method returns the Internet address and local port of the sender,
allowing the recipient to check where the message came from.
● It is possible to connect a datagram socket to a particular remote port and
Internet address, in which case the socket is only able to send messages to and
receive messages from that address.
18. Failure model for UDP datagrams
● A failure model for communication channels and defines reliable communication in terms of two
properties: integrity and validity. The integrity property requires that messages should not be
corrupted or duplicated.
UDP datagrams suffer from the following failures:
● Omission failures: Messages may be dropped occasionally, either because of a checksum error
or because no buffer space is available at the source or destination. To simplify the discussion,
we regard send-omission and receive-omission failures as omission failures in the
communication channel.
● Ordering: Messages can sometimes be delivered out of sender order.
19. Use of UDP
● For some applications, it is acceptable to use a service that is liable to occasional
omission failures.
● For example, the Domain Name System, which looks up DNS names in the Internet, is
implemented over UDP.
● Voice over IP (VOIP) also runs over UDP. UDP datagrams are sometimes an attractive
choice because they do not suffer from the overheads associated with guaranteed
message delivery.
● There are three main sources of overhead:
○ the need to store state information at the source and destination;
○ the transmission of extra messages;
○ latency for the sender.
20. Java API for UDP datagrams
● The Java API provides datagram communication by means of two classes:
DatagramPacket and DatagramSocket.
● DatagramPacket: This class provides a constructor that makes an instance out of an
array of bytes comprising a message, the length of the message and the Internet
address and local port number of the destination socket, as follows:
The message can be retrieved from the DatagramPacket by means of the method getData.
The methods getPort and getAddress access the port and Internet address.
21. DatagramSocket: This class supports sockets for sending and receiving UDP datagrams. It
provides a constructor that takes a port number as its argument, for use by processes that need to
use a particular port. It also provides a no-argument constructor that allows the system to choose a
free local port. These constructors can throw a SocketException if the chosen port is already in
use or if a reserved port
The class DatagramSocket provides methods that include the following:
● send and receive: These methods are for transmitting datagrams between a pair of sockets.
The argument of send is an instance of DatagramPacket containing a message and its
destination. The argument of receive is an empty DatagramPacket in which to put the message,
its length and its origin. The methods send and receive can throw IOExceptions.
● setSoTimeout: This method allows a timeout to be set. With a timeout set, the receive
method will block for the time specified and then throw an InterruptedIOException.
● connect: This method is used for connecting to a particular remote port and Internet
address, in which case the socket is only able to send messages to and receive messages from
that address.
22. UDP client sends a message to the server and gets a reply
import java.net.*;
import java.io.*;
public class UDPClient{
public static void main(String args[]){// args give message contents and server hostname
DatagramSocket aSocket = null;
try {
aSocket = new DatagramSocket();
byte [] m = args[0].getBytes();
InetAddress aHost = InetAddress.getByName(args[1]);
int serverPort = 6789;
DatagramPacket request = new DatagramPacket(m, m.length(), aHost, serverPort);
aSocket.send(request);
byte[] buffer = new byte[1000];
DatagramPacket reply = new DatagramPacket(buffer, buffer.length);
aSocket.receive(reply);
System.out.println("Reply: " + new String(reply.getData()));
} catch (SocketException e){System.out.println("Socket: " + e.getMessage());
} catch (IOException e){System.out.println("IO: " + e.getMessage());
} finally { if(aSocket != null) aSocket.close();}
}
}
23. UDP server repeatedly receives a request and sends it back to the client
import java.net.*;
import java.io.*;
public class UDPServer{
public static void main(String args[]){
DatagramSocket aSocket = null;
try{
aSocket = new DatagramSocket(6789);
byte[] buffer = new byte[1000];
while(true){
DatagramPacket request = new DatagramPacket(buffer, buffer.length);
aSocket.receive(request);
DatagramPacket reply = new
DatagramPacket(request.getData(),request.getLength(),request.getAddress(), request.getPort());
aSocket.send(reply);
}
} catch (SocketException e){System.out.println("Socket: " + e.getMessage());
} catch (IOException e) {System.out.println("IO: " + e.getMessage());
} finally {if (aSocket != null) aSocket.close();}
}
}
24. TCP stream communication
● The Transmission Control Protocol (TCP) is a connection-oriented reliable protocol
and it is an Internet protocol that connects a server and a client.
● Data travels over the Internet in packets.
The following characteristics of the network are hidden by the stream abstraction:
● Message sizes: The application can choose how much data it writes to a stream or
reads from it. It may deal in very small or very large sets of data. The underlying
implementation of a TCP stream decides how much data to collect before transmitting
it as one or more IP packets. On arrival, the data is handed to the application as
requested.
● Lost messages: The TCP protocol uses an acknowledgement scheme. As an example of
a simple scheme (which is not used in TCP), the sending end keeps a record of each IP
packet sent and the receiving end acknowledges all the arrivals. If the sender does
not receive an acknowledgement within a timeout, it retransmits the message.
25. Flow control: The TCP protocol attempts to match the speeds of the processes that read
from and write to a stream. If the writer is too fast for the reader, then it is blocked
until the reader has consumed sufficient data.
Message duplication and ordering: Message identifiers are associated with each IP
packet, which enables the recipient to detect and reject duplicates, or to reorder
messages that do not arrive in sender order.
Message destinations: A pair of communicating processes establish a connection before
they can communicate over a stream. Once a connection is established, the processes
simply read from and write to the stream without needing to use Internet addresses and
ports. Establishing a connection involves a connect request from client to server followed
by an accept request from server to client before any communication can take place. This
could be a considerable overhead for a single client-server request and reply.
26. The following are some outstanding issues related to stream communication:
● Matching of data items: Two communicating processes need to agree as to the
contents of the data transmitted over a stream. For example, if one process writes an
int followed by a double to a stream, then the reader at the other end must read an int
followed by a double. When a pair of processes do not cooperate correctly in their use
of a stream, the reading process may experience errors when interpreting the data or
may block due to insufficient data in the stream.
● Blocking: The data written to a stream is kept in a queue at the destination socket.
When a process attempts to read data from an input channel, it will get data from the
queue or it will block until data becomes available. The process that writes data to a
stream may be blocked by the TCP flow-control mechanism if the socket at the other
end is queuing as much data as the protocol allows.
● Threads: When a server accepts a connection, it generally creates a new thread in
which to communicate with the new client. The advantage of using a separate thread for
each client is that the server can block when waiting for input without delaying other
clients. In an environment in which threads are not provided, an alternative is to test
whether input is available from a stream before attempting to read it; for example, in a
UNIX environment the select system call may be used for this purpose.
27. Failure model
● To satisfy the integrity property of reliable communication, TCP streams use checksums to
detect and reject corrupt packets and sequence numbers to detect and reject duplicate packets.
● For the sake of the validity property, TCP streams use timeouts and retransmissions to deal with
lost packets
● But if the packet loss over a connection passes some limit or the network connecting a pair of
communicating processes is severed or becomes severely congested, the TCP software
responsible for sending messages will receive no acknowledgements and after a time will declare
the connection to be broken.
When a connection is broken, a process using it will be notified if it attempts to read or write. This
has the following effects:
• The processes using the connection cannot distinguish between network failure and failure of the
process at the other end of the connection.
• The communicating processes cannot tell whether the messages they have sent recently have been
received or not.
28. Use of TCP:
HTTP: The Hypertext Transfer Protocol is used for communication between web browsers and
web servers;
FTP: The File Transfer Protocol allows directories on a remote computer to be browsed and
files to be transferred from one computer to another over a connection.
Telnet: Telnet provides access by means of a terminal session to a remote computer.
SMTP: The Simple Mail Transfer Protocol is used to send mail between computers.
Java API for TCP streams •
The Java interface to TCP streams is provided in the classes ServerSocket and Socket:
ServerSocket: This class is intended for use by a server to create a socket at a server port
for listening for connect requests from clients. Its accept method gets a connect request
from the queue or, if the queue is empty, blocks until one arrives.The result of executing
accept is an instance of Socket – a socket to use for communicating with the client.
29. Socket:
● This class is for use by a pair of processes with a connection. The client uses a
constructor to create a socket, specifying the DNS hostname and port of a
server.
● This constructor not only creates a socket associated with a local port but also
connects it to the specified remote computer and port number.
● It can throw an UnknownHostException if the hostname is wrong or an
IOException if an IO error occurs.
● The Socket class provides the methods getInputStream and getOutputStream
for accessing the two streams associated with a socket. The return types of
these methods are InputStream and OutputStream, respectively
30. import java.net.*; Client Program
import java.io.*;
public class TCPClient {
public static void main (String args[]) {
// arguments supply message and hostname of destination
Socket s = null;
try{
int serverPort = 7896;
s = new Socket(args[1], serverPort);
DataInputStream in = new DataInputStream( s.getInputStream());
DataOutputStream out =new DataOutputStream( s.getOutputStream());
out.writeUTF(args[0]);
String data = in.readUTF();
System.out.println("Received: "+ data) ;
}catch (UnknownHostException e){
System.out.println("Sock:"+e.getMessage());
} catch (EOFException e){System.out.println("EOF:"+e.getMessage());
} catch (IOException e){System.out.println("IO:"+e.getMessage());
} finally {if(s!=null) try {s.close();}catch (IOException e){/*close failed*/}}
}
}
31. import java.net.*; Server Program
import java.io.*;
public class TCPServer {
public static void main (String args[]) {
try{
int serverPort = 7896;
ServerSocket listenSocket = new ServerSocket(serverPort);
while(true) {
Socket clientSocket = listenSocket.accept();
Connection c = new Connection(clientSocket);
}
} catch(IOException e) {System.out.println("Listen
:"+e.getMessage());}
}
}
class Connection extends Thread {
DataInputStream in;
DataOutputStream out;
Socket clientSocket;
public Connection (Socket aClientSocket) {
try {
clientSocket = aClientSocket;
in = new DataInputStream( clientSocket.getInputStream());
out =new DataOutputStream(
clientSocket.getOutputStream());
this.start();
} catch(IOException e)
{System.out.println("Connection:"+e.getMessage());}
public void run(){
try { // an echo server
String data = in.readUTF();
out.writeUTF(data);
} catch(EOFException e)
{System.out.println("EOF:"+e.getMessage());
} catch(IOException e)
{System.out.println("IO:"+e.getMessage());
} finally { try {clientSocket.close();}catch (IOException
e){/*close failed*/}}
}
}
32. External data representation and marshalling
● The information stored in running programs is represented as data structures – for
example, by sets of interconnected objects – whereas the information in messages
consists of sequences of bytes. Irrespective of the form of communication used, the
data structures must be flattened (converted to a sequence of bytes) before
transmission and rebuilt on arrival.
● The individual primitive data items transmitted in messages can be data values of many
different types, and not all computers store primitive values such as integers in the
same order. The representation of floating-point numbers also differs between
architectures. There are two variants for the ordering of integers: the so-called big-
endian order, in which the most significant byte comes first; and little-endian order, in
which it comes last.
33. ● Another issue is the set of codes used to represent characters: for example, the
majority of applications on systems such as UNIX use ASCII character coding, taking
one byte per character, whereas the Unicode standard allows for the representation of
texts in many different languages and takes two bytes per character.
One of the following methods can be used to enable any two computers to exchange binary
data values:
• The values are converted to an agreed external format before transmission and
converted to the local form on receipt; if the two computers are known to be the
same type, the conversion to external format can be omitted.
• The values are transmitted in the sender’s format, together with an indication of
the format used, and the recipient converts the values if necessary.
34. ● An agreed standard for the representation of data structures and
primitive values is called an external data representation.
● Marshalling is the process of taking a collection of data items and
assembling them into a form suitable for transmission in a message.
● Unmarshalling is the process of disassembling them on arrival to produce
an equivalent collection of data items at the destination.
● Thus marshalling consists of the translation of structured data items
and primitive values into an external data representation.
● Similarly, unmarshalling consists of the generation of primitive values
from their external data representation and the rebuilding of the data
structures.
35. Three alternative approaches to external data representation and marshalling
are discussed
CORBA’s common data representation, which is concerned with an external
representation for the structured and primitive types that can be passed as the
arguments and results of remote method invocations in CORBA
Java’s object serialization, which is concerned with the flattening and external
data representation of any single object or tree of objects that may need to be
transmitted in a message or stored on a disk. It is for use only by Java.
XML (Extensible Markup Language), which defines a textual format for
representing structured data.for example documents accessible on the Web –
but it is now also used to represent the data sent in messages exchanged by
clients and servers in web services
36. ● In the first two approaches, the primitive data types are marshalled into a binary form.
In the third approach (XML), the primitive data types are represented textually.
● The textual representation of a data value will generally be longer than the equivalent
binary representation
● Another issue with regard to the design of marshalling methods is whether the
marshalled data should include information concerning the type of its contents. For
example, CORBA’s representation includes just the values of the objects
transmitted,and nothing about their types.
● On the other hand, both Java serialization and XML do include type information, but in
different ways. Java puts all of the required type information into the serialized form,
but XML documents may refer to externally defined sets of names (with types) called
namespaces.
● Two other techniques for external data representation are worthy of mention. Google
uses an approach called protocol buffers to capture representations of both stored and
transmitted data There is also considerable interest in JSON (JavaScript Object
Notation) as an approach to external data representation [www.json.org]. Protocol
buffers and JSON represent a step towards more lightweight approaches to data
representation
37. 1.CORBA’s Common Data Representation (CDR)
● CORBA CDR is the external data representation defined with CORBA 2.0
[OMG 2004a].
● CDR can represent all of the data types that can be used as arguments
and return values in remote invocations in CORBA.
● These consist of 15 primitive types, which include short (16-bit), long
(32-bit), unsigned short, unsigned long, float (32-bit), double (64-bit),
char, boolean (TRUE, FALSE), octet (8-bit), and any (which can represent
any basic or constructed type); together with a range of composite types,
● Each argument or result in a remote invocation is represented by a
sequence of bytes in the invocation or result message
38.
39. Primitive types:
● CDR defines a representation for both big-endian and little-endian orderings.
● Values are transmitted in the sender’s ordering, which is specified in each
message. The recipient translates if it requires a different ordering.
● For example, a 16-bit short occupies two bytes in the message, and for big-
endian ordering, the most significant bits occupy the first byte and the least
significant bits occupy the second byte.
● Each primitive value is placed at an index in the sequence of bytes according
to its size. Suppose that the sequence of bytes is indexed from zero
upwards. Then a primitive value of size n bytes (where n = 1, 2, 4 or 8) is
appended to the sequence at an index that is a multiple of n in the stream of
bytes.
● Floating-point values follow the IEEE standard, in which the sign, exponent
and fractional part are in bytes 0–n for big-endian ordering and the other
way round for little-endian.
● Characters are represented by a code set agreed between client and server.
40. Constructed types:
● The primitive values that comprise each constructed type are added to a
sequence of bytes in a particular order, as shown in Figure 4.7.
● Figure 4.8 shows a message in CORBA CDR that contains the three fields
of a struct whose respective types are string, string and unsigned long.
● The figure shows the sequence of bytes with four bytes in each row. The
representation of each string consists of an unsigned long representing
its length followed by the characters in the string.
● For simplicity, we assume that each character occupies just one byte.
Variable-length data is padded with zeros so that it has a standard form,
enabling marshalled data or its checksum to be compared. Note that each
unsigned long, which occupies four bytes
41.
42. ● Another example of an external data representation is the Sun XDR
standard, which is specified in RFC 1832 [Srinivasan 1995b] and described in
www.cdk5.net/ipc.
● It was developed by Sun for use in the messages exchanged between clients
and servers in Sun NFS
● The type of a data item is not given with the data representation in the
message in either the CORBA CDR or the Sun XDR standard. This is because
it is assumed that the sender and recipient have common knowledge of the
order and types of the data items in a message.
● In particular, for RMI or RPC, each method invocation passes arguments of
particular types, and the result is a value of a particular type.
43. Marshalling in CORBA
● Marshalling operations can be generated automatically from the specification of the types of
data items to be transmitted in a message.
● The types of the data structures and the types of the basic data items are described in
CORBA IDL which provides a notation for describing the types of the arguments and results
of RMI methods. For example, we might use CORBA IDL to describe the data structure in the
message in Figure 4.8 as follows:
struct Person{
string name;
string place;
unsigned long year;
};
● The CORBA interface compiler generates appropriate marshalling and unmarshalling
operations for the arguments and results of remote methods from the definitions of the
types of their parameters and results.
44. 2.Java object serialization
● In Java RMI, both objects and primitive data values may be passed as
arguments and results of method invocations.
● An object is an instance of a Java class. For example, the Java class
equivalent to the Person struct defined in CORBA IDL might be:
public class Person implements Serializable {
private String name;
private String place;
private int year;
public Person(String aName, String aPlace, int aYear) {
name = aName;
place = aPlace;
year = aYear;
}
// followed by methods for accessing the instance variables
}
45. ● In Java, the term serialization refers to the activity of flattening an object
or a connected set of objects into a serial form that is suitable for storing on
disk or transmitting in a message, for example, as an argument or the result
of an RMI.
● Deserialization consists of restoring the state of an object or a set of
objects from their serialized form. It is assumed that the process that does
the deserialization has no prior knowledge of the types of the objects in the
serialized form. Therefore some information about the class of each object
is included in the serialized form. This information enables the recipient to
load the appropriate class when an object is deserialized.
● The information about a class consists of the name of the class and a version
number.
● The version number is intended to change when major changes are made to
the class. It can be set by the programmer or calculated automatically as a
hash of the name of the class and its instance variables, methods and
interfaces. The process that deserializes an object can check that it has the
correct version of the class.
46. ● Java objects can contain references to other objects. When an object is serialized, all
the objects that it references are serialized together with it to ensure that when the
object is reconstructed, all of its references can be fulfilled at the destination.
References are serialized as handles.
● In this case, the handle is a reference to an object within the serialized form – for
example, the next number in a sequence of positive integers. The serialization
procedure must ensure that there is a 1–1 correspondence between object references
and handles.
● It must also ensure that each object is written once only – on the second or subsequent
occurrence of an object, the handle is written instead of the object.
● To serialize an object, its class information is written out, followed by the types and
names of its instance variables.
● Each class is given a handle, and no class is written more than once to the stream of
bytes (the handles being written instead where necessary).
47. ● The contents of the instance variables that are primitive types, such as
integers, chars, booleans, bytes and longs, are written in a portable
binary format using methods of the ObjectOutputStream class.
● Strings and characters are written by its writeUTF method using the
Universal Transfer Format (UTF-8), which enables ASCII characters to
be represented unchanged (in one byte), whereas Unicode characters are
represented by multiple bytes. Strings are preceded by the number of
bytes they occupy in the stream.
● As an example, consider the serialization of the following object:
Person p = new Person("Smith", "London", 1984);
48.
49. ● To make use of Java serialization, for example to serialize the Person
object, create an instance of the class ObjectOutputStream and invoke
its writeObject method, passing the Person object as its argument.
● To deserialize an object from a stream of data, open an
ObjectInputStream on the stream and use its readObject method to
reconstruct the original object.
● The use of this pair of classes is similar to the use of DataOutputStream
and DataInputStream Serialization and deserialization of the arguments
and results of remote invocations are generally carried out automatically
by the middleware, without any participation by the application
programmer.
● If necessary, programmers with special requirements may write their
own version of the methods that read and write objects.
50. The use of reflection •
● The Java language supports reflection – the ability to enquire about the properties of a
class, such as the names and types of its instance variables and methods. I
● t also enables classes to be created from their names, and a constructor with given
argument types to be created for a given class.
● Java object serialization uses reflection to find out the class name of the object to be
serialized and the names, types and values of its instance variables. That is all that is
needed for the serialized form.
● For deserialization, the class name in the serialized form is used to create a class.This
is then used to create a new constructor with argument types corresponding to those
specified in the serialized form. Finally, the new constructor is used to create a new
object with instance variables whose values are read from the serialized form.
51. 3.Extensible Markup Language (XML)
● XML is a markup language that was defined by the World Wide Web
Consortium (W3C) for general use on the Web.
● In general, the term markup language refers to a textual encoding that
represents both a text and details as to its structure or its appearance.
● Both XML and HTML were derived from SGML (Standardized
Generalized Markup Language) [ISO 8879], a very complex markup
language.
● HTML was designed for defining the appearance of web pages. XML was
designed for writing structured documents for the Web
● XML is used to enable clients to communicate with web services and for
defining the interfaces and other properties of web services.
52. ● However, XML is also used in many other ways, including in archiving and
retrieval systems – although an XML archive may be larger than a binary
one, it has the advantage of being readable on any computer.
● Other examples of uses of XML include for the specification of user
interfaces and the encoding of configuration files in operating systems.
● XML is extensible in the sense that users can define their own tags, in
contrast to HTML, which uses a fixed set of tags
● However, if an XML document is intended to be used by more than one
application, then the names of the tags must be agreed between them.
● For example, clients usually use SOAP messages to communicate with web
services. SOAP is an XML format whose tags are published for use by
web services and their clients.
53. ● XML documents, being textual, can be read by humans. In practice, most XML documents
are generated and read by XML processing software
● In addition, the use of text makes XML independent of any particular platform. The use
of a textual rather than a binary representation, together with the use of tags, makes
the messages large, so they require longer processing and transmission times, as well as
more space to store
● XML definition of the Person structure
<person id="123456789">
<name>Smith</name>
<place>London</place>
<year>1984</year>
<!-- a comment -->
</person >
54. XML elements and attributes
● Figure 4.10 shows the XML definition of the Person structure that was used to illustrate
marshalling in CORBA CDR and Java.
● It shows that XML consists of tags and character data. The character data, for example
Smith or 1984, is the actual data.
Elements:
● An element in XML consists of a portion of character data surrounded by matching start and
end tags.
● For example, one of the elements in Figure 4.10 consists of the data Smith contained within
the <name> ... </name> tag pair. Note that the element with the <name> tag is enclosed in the
element with the <person id="123456789"> … </person > tag pair.
● The ability of an element to enclose another element allows hierarchic data to be represented
– a very important aspect of XML.
55. Attributes:
A start tag may optionally include pairs of associated attribute names and values such as
id="123456789". An element is generally a container for data, whereas an attribute is used
for labelling that data
Names:
The names of tags and attributes in XML generally start with a letter, but can also start
with an underline or a colon. The names continue with letters, digits, hyphens, underscores,
colons or full stops. Letters are case-sensitive. Names that start with xml are reserved.
Binary data:
All of the information in XML elements must be expressed as character data. They can be
represented in base64 notation [Freed and Borenstein 1996], which uses only the
alphanumeric characters together with +, / and =, which has a special meaning.
56. Parsing and well-formed documents
● An XML document must be well formed – that is, it must conform to rules about its structure.
● A basic rule is that every start tag has a matching end tag.
● Another basic rule is that all tags are correctly nested – for example, <x>..<y>..</y>..</x> is
correct, whereas <x>..<y>..</x>..</y> is not.
● Finally, every XML document must have a single root element that encloses all the other
elements. These rules make it very simple to implement parsers for XML documents. When a
parser reads an XML document that is not well formed, it will report a fatal error.
CDATA:
● XML parsers normally parse the contents of elements because they may contain further
nested structures. If text needs to contain an angle bracket or a quote, it may be represented
in a special way: for example, < represents the opening angle bracket. for example, because
it contains special characters – it can be denoted as CDATA. For example, if a place name is
to include an apostrophe, then it could be specified in either of the two following ways:
<place> King&apos Cross </place >
<place> <![CDATA [King's Cross]]></place >
57. XML prolog:
● Every XML document must have a prolog as its first line. The prolog must at
least specify the version of XML in use (which is currently 1.0).
● For example:
<?XML version = "1.0" encoding = "UTF-8" standalone = "yes"?>
● The prolog may specify the encoding (UTF-8 is the default ). The term
encoding refers to the set of codes used to represent characters – ASCII
being the best-known example. Note that in the XML prolog, ASCII is
specified as usascii.
● Other possible encodings include ISO-8859-1 and various other 8-bit
encodings for representing other alphabets, for example, Greek
58. XML namespaces
● Traditionally, namespaces provide a means for scoping names.
● An XML namespace is a set of names for a collection of element types and attributes
that is referenced by a URL. Any other XML document can use an XML namespace by
referring to its URL.
● Any element that makes use of an XML namespace can specify that namespace as an
attribute called xmlns, whose value is a URL referring to the file containing the
namespace definitions. For example:
xmlns:pers = "http://www.cdk5.net/person"
<person pers:id="123456789" xmlns:pers = "http://www.cdk5.net/person">
<pers:name> Smith </pers:name>
<pers:place> London </pers:place >
<pers:year> 1984 </pers:year>
</person>
59. XML schemas
● An XML schema [www.w3.org VIII] defines the elements and attributes that
can appear in a document, how the elements are nested and the order and
number of elements, and whether an element is empty or can include text.
● For each element, it defines the type and default value
An XML schema for the Person structure
<xsd:schema xmlns:xsd = URL of XML schema definitions >
<xsd:element name= "person" type ="personType" />
<xsd:complexType name="personType">
<xsd:sequence>
<xsd:element name = "name" type="xs:string"/>
<xsd:element name = "place" type="xs:string"/>
<xsd:element name = "year" type="xs:positiveInteger"/>
</xsd:sequence>
<xsd:attribute name= "id" type = "xs:positiveInteger"/>
</xsd:complexType>
</xsd:schema>
60. Document type definitions:
● Document type definitions (DTDs) [www.w3.org VI] were provided as a part
of the XML 1.0 specification for defining the structure of XML documents
and are still widely used for that purpose.
● The syntax of DTDs is different from the rest of XML and it is quite limited
in what it can specify; for example, it cannot describe data types and its
definitions are global, preventing element names from being duplicated.
● DTDs are not used for defining web services
APIs for accessing XML
● XML parsers and generators are available for most commonly used
programming languages.
● For example, there is Java software for writing out Java objects as XML
(marshalling) and for creating Java objects from such structures
(unmarshalling). Similar software is available in Python for Python data types
and objects
61. 4.Remote object references
● When a client invokes a method in a remote object, an invocation message is sent to the
server process that hosts the remote object.
● This message needs to specify which particular object is to have its method invoked.
● A remote object reference is an identifier for a remote object that is valid
throughout a distributed system. A remote object reference is passed in the invocation
message to specify which object is to be invoked.
● In general, there may be many processes hosting remote objects, so remote object
references must be unique among all of the processes in the various computers in a
distributed system.
● Even after the remote object associated with a given remote object reference is
deleted, it is important that the remote object reference is not reused, because its
potential invokers may retain obsolete remote object references
62.
63. Client-Server Communication
● This form of communication is designed to support the
roles and message exchanges in C-S interaction.
● Req-Rep Comm is synchronous because the client process
blocks until the reply (Ack) arrives from the server. Also
reliable.
● Asynchronous Req-Rep Comm may be useful where clients
can afford to retrieve replies later.
64. The client-server exchanges are described in the following paragraphs in terms
of
the send and receive operations in the Java API for UDP datagrams, although
many
current implementations use TCP streams. A protocol built over datagrams avoids
unnecessary overheads associated with the TCP stream protocol.
In particular:
• Acknowledgements are redundant, since requests are followed by replies.
• Establishing a connection involves two extra pairs of messages in addition to
the
pair required for a request and a reply.
• Flow control is redundant for the majority of invocations, which pass only small
arguments and results.
65.
66. •Figure shows a request-reply communication based on three communication primitives :
•doOperation, getRequest, and sendReply.
•This protocol is used in most RMI and RPC system:
•The doOperation method is used by clients to invoke remote operations. Its
arguments specify the remote server and which operation to invoke, together with
additional information (arguments) required by the operation
•After sending the request message, doOperation invokes receive to wait for a reply
message.
•GetRequest is used by a server process to acquire service requests .
67.
68. The information to be transmitted in a request message or a reply message is
shown in Figure 5.4
69. Message identifiers
● Any scheme that involves the management of messages to provide
additional properties such as reliable message delivery or request-reply
communication requires that each message have a unique message
identifier by which it may be referenced.
● A message identifier consists of two parts:
1. a requestId, which is taken from an increasing sequence of integers by
the sending process;
2. an identifier for the sender process, for example, its port and
Internet address.
70. Failure model of the request-reply protocol
• If the three primitives doOperation, getRequest and sendReply are
implemented over UDP datagrams, then they suffer from the same
communication failures. That is:
• They suffer from omission failures.
• Messages are not guaranteed to be delivered in sender order.
71. Preventive methods
Timeouts - There are various options as to what doOperation can do
after a timeout. doOperation sends the request repeatedly until it
gets a reply or it is sure that the delay is due to lack of the server
response rather than lost messages.
Discarding duplicate request messages - The protocol is designed to
recognize successive messages and filter out duplicates.
72. Lost reply messages
● The server will re-execute the same operation if it receives a duplicate request.
● Some servers can execute their operations more than once and obtain the same
results each time.
● An idempotent operation is an operation that can be performed repeatedly with
the same effect as if it had been performed exactly once. For example, an
operation to add an element to a set is an idempotent operation because it will
always have the same effect on the set each time it is performed, whereas an
operation to append an item to a sequence is not an idempotent operation
because it extends the sequence each time it is performed.
73. History
● For servers that require retransmission of replies without re-execution
of operations, a history may be used.
● The term ‘history’ is used to refer to a structure that contains a record
of (reply) messages that have been transmitted.
● An entry in a history contains a request identifier, a message and an
identifier of the client to which it was sent.
● A problem associated with the use of a history is its memory cost.
74. Styles of exchange protocols • Three protocols, that produce differing
behaviours in the presence of communication failures are used for
implementing various types of request behaviour.
They were originally identified by Spector [1982]:
• the request (R) protocol;
• the request-reply (RR) protocol;
• the request-reply-acknowledge reply (RRA) protocol.
75.
76. Use of TCP Stream to implement request- Reply Protocol
•The datagram is limited to 8 kilobytes and may not be adequate for use in RMI.
•The TCP can be used.
•The TCP protocol has the following advantages:
•It makes it possible to transmit objects of any size( 20 kb to 60 kb).
•There is no need to deal with retransmission.
•The flow control mechanism allows large arguments and results.
•Thus, the TCP protocol is chosen for implementing request-reply protocols. Costly,
but no need for the request-reply protocol to deal with retransmission and filtering.
77. HTTP: An example of a request-reply protocol
● web servers manage resources implemented in different ways:
• as data – for example the text of an HTML page, an image or the class of an
applet;
• as a program – for example, servlets [java.sun.com III], or PHP or Python
programs that run on the web server
● HTTP protocol allows for content negotiation and password-style authentication:
Content negotiation: Clients’ requests can include information as to what data
representations they can accept (for example, language or media type), enabling
the server to choose the representation that is the most appropriate for the user.
Authentication: Credentials and challenges are used to support password-style
authentication.
78. HTTP is implemented over TCP. In the original version of the protocol, each client
server interaction consisted of the following steps:
• The client requests and the server accepts a connection at the default server port or at a
port specified in the URL.
• The client sends a request message to the server.
• The server sends a reply message to the client.
• The connection is closed.
•Resources implemented as data are supplied as MIME-like structures. Multipurpose
Internet Mail Extension (MIME) is a standard for send multipart data containing, text,
images, and sound in e-mail messages.•
•Data is prefixed with its Mime type so that the recipient will know how to handle it. A
Mime type specifies a type and a subtype, for example, text/plain, text/html, image/gif,
image/jpeg.
79. The HTTP methods include
•GET – The GET method is used to retrieve information from the given server using a
given URL. Requests using GET should only retrieve data and should have no other effect
on the data.
•HEAD – the request is identical to GET, but it does not return any data. Ie) returns all
the info. About the data ( time of last modification, type, size)
•POST - A POST request is used to send data to the server, for example, customer
information, file upload, etc. using HTML forms.
•PUT – requests that the data supplied in the request is stored with the given URL as
identifier. Replaces all current representations of the target resource with the uploaded
content.
•DELETE – the server deletes the resource identified by the given URL .
•OPTIONS – the server supplies the client with a list of methods it allows to be
applied to the given URL and its special requirements.
•TRACE – the server sends back the request message. Used for diagnostic purposes.
81. Group Communication
● Group communication provides our first example of an indirect
communication paradigm.
● Group communication offers a service whereby a message is sent to a
group and then this message is delivered to all members of the group.
● Group communication represents an abstraction over multicast
communication and may be implemented over IP multicast or an equivalent
overlay network, adding significant extra value in terms of managing
group membership, detecting failures and providing reliability and
ordering guarantees.
82. Group communication is an important building block for distributed systems, and particularly
reliable distributed systems, with key areas of application including:
• the reliable dissemination of information to potentially large numbers of clients, including in
the financial industry, where institutions require accurate and up-todate access to a wide
variety of information sources;
• support for collaborative applications, where again events must be disseminated to multiple
users to preserve a common user view – for example, in multiuser games, as discussed in
Chapter 1;
• support for a range of fault-tolerance strategies, including the consistent update of
replicated data (as discussed in detail in Chapter 18) or the implementation of highly available
(replicated) servers;
• support for system monitoring and management, including for example load balancing
strategies.
83. The programming model:
● In group communication, the central concept is that of a group with
associated group membership, whereby processes may join or leave the
group.
● Processes can then send a message to this group and have it propagated
to all members of the group with certain guarantees in terms of
reliability and ordering.
● Thus, group communication implements multicast communication, in which
a message is sent to all the members of the group by a single operation.
● Communication to all processes in the system, as opposed to a subgroup
of them, is known as broadcast, whereas communication to a single
process is known as unicast.
84.
85. ● The essential feature of group communication is that a process issues
only one multicast operation to send a message to each of a group of
processes (in Java this operation is aGroup.send(aMessage)) instead of
issuing multiple send operations to individual processes.
● The use of a single multicast operation instead of multiple send
operations amounts to much more than a convenience for the
programmer:
● it enables the implementation to be efficient in its utilization of
bandwidth. It can take steps to send the message no more than once over
any communication link, by sending it over a distribution tree; and it can
use network hardware support for multicast where this is available. The
implementation can also minimize the total time taken to deliver the
message to all destinations, as compared with transmitting it separately
and serially.
86. Process groups and object groups
● Most work on group services focuses on the concept of process groups, that is, groups
where the communicating entities are processes. Such services are relatively low-level
in that:
○ Messages are delivered to processes and no further support for dispatching is
provided.
○ Messages are typically unstructured byte arrays with no support for marshalling of
complex data types
○ In contrast, object groups provide a higher-level approach to group computing. An
object group is a collection of objects that process the same set of invocations
concurrently, with each returning responses.
○ Client objects need not be aware of the replication. They invoke operations on a single,
local object, which acts as a proxy for the group. The proxy uses a group communication
system to send the invocations to the members of the object group.
○ Object parameters and results are marshalled as in RMI and the associated calls are
dispatched automatically to the right destination objects/methods
87. ● Electra [Maffeis 1995] is a CORBA-compliant system that supports object
groups. An Electra group can be interfaced to any CORBA-compliant
application.
● Electra was originally built on top of the Horus group communication system,
which it uses to manage the membership of the group and to multicast
invocations.
● In ‘transparent mode’, the local proxy returns the first available response to
a client object. In ‘non-transparent mode’, the client object can access all the
responses returned by the group members.
● Electra uses an extension of the standard CORBA Object Request Broker
interface, with functions for creating and destroying object groups and
managing their membership.
● Eternal [Moser et al. 1998] and the Object Group Service [Guerraoui et al.
1998] also provide CORBA-compliant support for object groups
88. Other key distinctions
Closed and open groups: A group is said to be closed if only members of the group may
multicast to it . A process in a closed group delivers to itself any message that it multicasts
to the group. A group is open if processes outside the group may send to it.
Closed groups of processes are useful, for example, for cooperating servers to send
messages to one another that only they should receive. Open groups are useful, for example,
for delivering events to groups of interested processes.
89. ● Overlapping and non-overlapping groups: Inoverlapping groups, entities
(processes or objects) may be members of multiple groups, and non-
overlapping groups imply that membership does not overlap (that is, any
process belongs to at most one group).
● Synchronous and asynchronous systems: There is a requirement to
consider group communication in both environments
90. Implementation issues
Reliability:
● Reliability in one-to-one communication was defined in terms of two
properties: integrity (the message received is the same as the one sent, and
no messages are delivered twice) and validity (any outgoing message is
eventually delivered).
● The interpretation for reliable multicast builds on these properties, with
integrity defined the same way in terms of delivering the message correctly
at most once, and validity guaranteeing that a message sent will eventually be
delivered. To extend the semantics to cover delivery to multiple receivers, a
third property is added – that of agreement, stating that if the message is
delivered to one process, then it is delivered to all processes in the group.
91. Ordering in multicast:
● FIFO ordering: First-in-first-out (FIFO) ordering (also referred to as source
ordering) is concerned with preserving the order from the perspective of a
sender process, in that if a process sends one message before another, it will
be delivered in this order at all processes in the group.
● Causal ordering: Causal ordering takes into account causal relationships
between messages, in that if a message happens before another message in
the distributed system this so-called causal relationship will be preserved in
the delivery of the associated messages at all processes
● Total ordering: In total ordering, if a message is delivered before another
message at one process, then the same order will be preserved at all
processes.
93. A group membership service has four main tasks:
● Providing an interface for group membership changes: The membership
service provides operations to create and destroy process groups and to
add or withdraw a process to or from a group
● Failure detection: The service monitors the group members not only in
case they should crash, but also in case they should become unreachable
because of a communication failure. The detector marks processes as
Suspected or Unsuspected. The service uses the failure detector to
reach a decision about the group’s membership: it excludes a process
from membership if it is suspected to have failed or to have become
unreachable.
94. Notifying members of group membership changes: The service notifies the
group’s members when a process is added, or when a process is excluded
(through failure or when the process is deliberately withdrawn from the
group).
Performing group address expansion: When a process multicasts a message, it
supplies the group identifier rather than a list of processes in the group.
95. DISTRIBUTED OBJECTS AND COMPONENTS
● A complete middleware solution must present a higher-level programming
abstraction as well as abstracting over the underlying complexities involved in
distributed systems.
● This chapter examines two of the most important programming abstractions,
namely distributed objects and components, and also examines associated
middleware platforms including CORBA, Enterprise JavaBeans and Fractal
● CORBA is a middleware design that allows application programs to
communicate with one another irrespective of their programming languages,
their hardware and software platforms, the networks they communicate over
and their implementors.
● Applications are built from CORBA objects, which implement interfaces
defined in CORBA’s interface definition language, IDL. Like Java RMI, CORBA
supports transparent invocation of methods on remote objects. The
middleware component that supports RMI is called the object request
broker, or ORB.
96. ● Component-based middleware has emerged as a natural evolution of
distributed objects, providing support for managing dependencies
between components, hiding lowlevel details associated with the
middleware, managing the complexities of establishing distributed
applications with appropriate non-functional properties such as security,
and supporting appropriate deployment strategies.
● Key technologies in this area include Enterprise JavaBeans and Fractal.
97. Distributed object middleware
● The key characteristic of distributed objects is that they allow you to
adopt an object-oriented programming model for the development of
distributed systems and, through this, hide the underlying complexity of
distributed programming.
● In this approach, communicating entities are represented by objects.
Objects communicate mainly using remote method invocation, but also
possibly using an alternative communication paradigm (such as distributed
events).
98. This relatively simple approach has a number of important benefits, including the
following:
● The encapsulation inherent in object-based solutions is well suited to distributed
programming.
● The related property of data abstraction provides a clean separation between the
specification of an object and its implementation, allowing programmers to deal
solely in terms of interfaces and not be concerned with implementation details
such as the programming language and operating system used.
● This approach also lends itself to more dynamic and extensible solutions, for
example by enabling the introduction of new objects or the replacement of one
object with another (compatible) object.
99. Distributed objects
Middleware based on distributed objects is designed to provide a programming model based on
object-oriented principles and therefore to bring the benefits of the object oriented approach to
distributed programming.
Emmerich [2000] sees such distributed objects as a natural evolution from three strands of
activity:
● In distributed systems, earlier middleware was based on the client-server model and there
was a desire for more sophisticated programming abstractions.
● In programming languages, earlier work in object-oriented languages such as Simula-67 and
Smalltalk led to the emergence of more mainstream and heavily used programming languages
such as Java and C++ (languages used extensively in distributed systems).
● In software engineering, significant progress was made in the development of object-oriented
design methods, leading to the emergence of the Unified Modelling Language (UML) as an
industrial-standard notation for specifying object-oriented software systems.
100. ● Distributed object middleware offers a programming
abstraction based on object oriented principles.
● Leading examples of distributed object middleware include
JavavRMI and CORBA
● While Java RMI and CORBA share a lot in common, there is
one important difference:
○ the use of Java RMI is restricted to Java-based development,
○ whereas CORBA is a multi-language solution allowing objects written
in a variety of languages to interoperate. (Bindings exist for C++,
Java, Python and several others.)
101.
102. ● Class is a fundamental concept in object-oriented languages but does not feature so
prominently in distributed object middleware.
● As noted in the CORBA case study, it is difficult to agree upon a common
interpretation of class in a heterogeneous environment where multiple languages
coexist.
● In the object oriented world more generally, class has several interpretations, including
the description of the behaviour associated with a group of objects (the template used
to create an object from the class), the place to go to instantiate an object with a given
behaviour (the associated factory) or even the group of objects that adhere to that
behaviour.
● While the term ‘class’ is avoided, more specific terms such as ‘factory’ and ‘template’
are readily used (a factory being an object that will instantiate a new object from a
given template).
103. ● The style of inheritance is significantly different from that offered in most object
oriented languages.
● In particular, distributed object middleware offers interface inheritance, which is a
relationship between interfaces whereby the new interface inherits the method
signatures of the original interface and can add extra ones.
● In contrast, object-oriented languages such as Smalltalk offer implementation
inheritance as a relationship between implementations, whereby the new class (in this
case) inherits the implementation (and hence behaviour) of the original class and can
add extra behaviour.
● Implementation inheritance is much more difficult to implement, particularly in
distributed systems, due to the need to resolve the correct executable behaviour at
runtime. Consider, for example, the level of heterogeneity that may exist in a
distributed system, together with the need to implement highly scalable solutions.
104. The added complexities:
Because of the added complexities involved, the associated distributed
object middleware must provide additional functionality, as summarized
below:
Inter-object communication:
● A distributed object middleware framework must offer one or more
mechanisms for objects to communicate in the distributed environment.
● This is normally provided by remote method invocation, although
distributed object middleware often supplements this with other
communications paradigms (for example, indirect approaches such as
distributed events).
● CORBA provides an event service and an associated notification service,
both implemented as services on top of the core middleware
105. Lifecycle management: Lifecycle management is concerned with the creation, migration and
deletion of objects, with each step having to deal with the distributed nature of the
underlying environment
Activation and deactivation: In non-distributed implementations, it can often be assumed
that objects are active all the time while the process that contains them runs. In distributed
systems, however, this cannot be assumed as the numbers of objects may be very large, and
hence it would be wasteful of resources to have all objects available at any time. In addition,
nodes hosting objects may be unavailable for periods of time. Activation is the process of
making an object active in the distributed environment by providing the necessary resources
for it to process incoming invocations – effectively, locating the object in virtual memory and
giving it the necessary threads to execute. Deactivation is then the opposite process,
rendering an object temporarily unable to process invocations.
106. Persistence: Objects typically have state, and it is important to maintain this
state across possible cycles of activation and deactivation and indeed system
failures Distributed object middleware must therefore offer persistency
management for stateful objects.
Additional services: A comprehensive distributed object middleware
framework must also provide support for the range of distributed system
services considered in this book, including naming, security and transaction
services.
107. CORBA-Common Object Request Broker Architecture
● The Common Object Request Broker Architecture (CORBA) is a standard
architecture for a distributed objects system.
● CORBA is designed to allow distributed objects to interoperate in a
heterogenous environment, where objects can be implemented in
different programming language and/or deployed on different platforms
CORBA differs from the architecture of Java RMI in one significant aspect:
● RMI is a proprietary facility developed by Sun MicroSystems, Inc., and
supports objects written in the Java programming langugage only.
● CORBA is an architecture that was developed by the Object
Management Group (OMG), an industrial consortium.
108. CORBA Object Interface
● A distributed object is defined using a software file similar to the remote interface
file in Java RMI.
● Since CORBA is language independent, the interface is defined using a universal
language with a distinct syntax, known as the CORBA Interface Definition Language
(IDL).
● The syntax of CORBA IDL is similar to Java and C++.
● However, object defined in a CORBA IDL file can be implemented in a large number of
diverse programming languages, including C, C++, Java, COBOL, Smalltalk, Ada, Lisp,
Python, and IDLScript.
● For each of these languages, OMG has a standardized mapping from CORBA IDL to the
programming language, so that a compiler can be used to process a CORBA interface to
generate the proxy files needed to interface with an object implementation or an
object client written in any of the CORBA-compatible languages
109. A stub is a mechanism that effectively creates and issues requests on behalf of a client, • a
skeleton is a mechanism that delivers requests to the CORBA object implementation.
110.
111. Remote procedure call
● RPC was first introduced by Birrell and Nelson [1984] and paved the way for
many of the developments in distributed systems programming used today.
● Remote Procedure Call (RPC) is a protocol that one program can use to
request a service from a program located in another computer on a network
without having to understand the network's details. RPC is used to call other
processes on the remote systems like a local system.
112.
113. Design issues for RPC
1. The style of programming promoted by RPC – programming with
interfaces;
2. The call semantics associated with RPC;
3. The key issue of transparency and how it relates to remote
procedure calls.
114. 1.Programming with interfaces
● Most modern programming languages provide a means of organizing a program as a set of
modules that can communicate with one another.
● Communication between modules can be by means of procedure calls between modules or
by direct access to the variables in another module.
● In order to control the possible interactions between modules, an explicit interface is
defined for each module.
● The interface of a module specifies the procedures and the variables that can be
accessed from other modules.
● Modules are implemented so as to hide all the information about them except that
which is available through its interface. So long as its interface remains the same, the
implementation may be changed without affecting the users of the module.
115. Interfaces in distributed systems
● In a distributed program, the modules can run in separate processes. In
the client-server model, in particular, each server provides a set of
procedures that are available for use by clients.
● For example, a file server would provide procedures for reading and
writing files. The term service interface is used to refer to the
specification of the procedures offered by a server, defining the types
of the arguments of each of the procedures.
116. There are a number of benefits to programming with interfaces in
distributed systems, stemming from the important separation between
interface and implementation:
• As with any form of modular programming, programmers are concerned only
with the abstraction offered by the service interface and need not be aware
of implementation details.
• Extrapolating to (potentially heterogeneous) distributed systems,
programmers also do not need to know the programming language or
underlying platform used to implement the service (an important step towards
managing heterogeneity in distributed systems).
• This approach provides natural support for software evolution in that
implementations can change as long as long as the interface remains the same.
More correctly, the interface can also change as long as it remains compatible
with the original.
117. The definition of service interfaces is influenced by the distributed nature
of the underlying infrastructure:
• It is not possible for a client module running in one process to access the
variables in a module in another process. Therefore the service interface
cannot specify direct access to variables. Note that CORBA IDL interfaces
can specify attributes, which seems to break this rule. However, the
attributes are not accessed directly but by means of some getter and setter
procedures added automatically to the interface.
118. ● The parameter-passing mechanisms used in local procedure calls – for example, call by
value and call by reference, are not suitable when the caller and procedure are in
different processes. In particular, call by reference is not supported. Rather, the
specification of a procedure in the interface of a module in a distributed program
describes the parameters as input or output, or sometimes both.
● Input parameters are passed to the remote server by sending the values of the
arguments in the request message and then supplying them as arguments to the
operation to be executed in the server.
● Output parameters are returned in the reply message and are used as the result of the
call or to replace the values of the corresponding variables in the calling environment.
● When a parameter is used for both input and output, the value must be transmitted
in both the request and reply messages.
119. ● Another difference between local and remote modules is that addresses
in one process are not valid in another remote one. Therefore, addresses
cannot be passed as arguments or returned as results of calls to remote
modules.
These constraints have a significant impact on the specification of
interface definition languages
120. Interface definition languages
● An RPC mechanism can be integrated with a particular programming language if it
includes an adequate notation for defining interfaces, allowing input and output
parameters to be mapped onto the language’s normal use of parameters.
● This approach is useful when all the parts of a distributed application can be written in
the same language. It is also convenient because it allows the programmer to use a single
language, for example, Java, for local and remote invocation
● However, many existing useful services are written in C++ and other languages. It would
be beneficial to allow programs written in a variety of languages, including Java, to
access them remotely. Interface definition languages (IDLs) are designed to allow
procedures implemented in different languages to invoke one another. An IDL provides a
notation for defining interfaces in which each of the parameters of an operation may be
described as for input or output in addition to having its type specified
121.
122. Figure 5.8 shows a simple example of CORBA IDL. The Person structure is
the same as the one used to illustrate marshalling in Section 4.3.1. The
interface named PersonList specifies the methods available for RMI in a
remote object that implements that interface. For example, the method
addPerson specifies its argument as in, meaning that it is an input argument,
and the method getPerson that retrieves an instance of Person by name
specifies its second argument as out, meaning that it is an output argument.
123. The concept of an IDL was initially developed for RPC systems but applies
equally to RMI and also web services. Our case studies include:
• Sun XDR as an example of an IDL for RPC
• CORBA IDL as an example of an IDL for RMI
• Web Services Description Language (WSDL), which is designed for an
Internet-wide RPC supporting web services
• Protocol buffers used at Google for storing and interchanging many kinds
of structured information
124. 2.RPC call semantics
Maybe semantics:
With maybe semantics, the remote procedure call may be executed once or not at all. Maybe semantics
arises when no fault-tolerance measures are applied and can suffer from the following types of failure:
• omission failures if the request or result message is lost;
• crash failures when the server containing the remote operation fails.
At-least-once semantics:
With at-least-once semantics, the invoker receives either a result, in which case the invoker knows that
the procedure was executed at least once, or an exception informing it that no result was received. At-
least-once semantics can be achieved by the retransmission of request messages, which masks the
omission failures of the request or result message. At-least-once semantics can suffer from the
following types of failure:
• crash failures when the server containing the remote procedure fails;
• arbitrary failures – in cases when the request message is retransmitted, the remote server may
receive it and execute the procedure more than once, possibly causing wrong values to be stored
or returned.
If the operations in a server can be designed so that all of the procedures in their service
interfaces are idempotent operations, then at-least-once call semantics may be acceptable.
125. At-most-once semantics:
● With at-most-once semantics, the caller receives either a result, in which case the
caller knows that the procedure was executed exactly once, or an exception
informing it that no result was received, in which case the procedure will have
been executed either once or not at all.
● At-most-once semantics can be achieved by using all of the fault-tolerance
measures
126. 3.Transparency
● The originators of RPC, Birrell and Nelson [1984], aimed to make remote
procedure calls as much like local procedure calls as possible, with no
distinction in syntax between a local and a remote procedure call.
● All the necessary calls to marshalling and message-passing procedures
were hidden from the programmer making the call. Although request
messages are retransmitted after a timeout, this is transparent to the
caller to make the semantics of remote procedure calls like that of local
procedure calls.
128. ● The client that accesses a service includes one stub procedure for each procedure in
the service interface. The stub procedure behaves like a local procedure to the client,
but instead of executing the call, it marshals the procedure identifier and the
arguments into a request message, which it sends via its communication module to the
server. When the reply message arrives, it unmarshals the results.
● The server process contains a dispatcher together with one server stub procedure and
one service procedure for each procedure in the service interface. The dispatcher
selects one of the server stub procedures according to the procedure identifier in the
request message. The server stub procedure then unmarshals the arguments in the
request message, calls the corresponding service procedure and marshals the return
values for the reply message. The service procedures implement the procedures in the
service interface
● The client and server stub procedures and the dispatcher can be generated
automatically by an interface compiler from the interface definition of the service.