The document discusses several key issues related to data communication between computers including different data formats, marshalling or converting data to a common format for transmission, and representation standards like CORBA CDR, Java serialization, and XML. It also covers remote object references, multicast communication, and the failure characteristics of unreliable multicast.
[Distributed System] ch4. interprocess communication
1.
2. • Data Problems in Communications
• Store primitive values by different byte order (Big endian, Little endian)
• Different code sets for OS (UTF-8, CP949, etc…)
• enable any two computers to exchange binary data values
• The values are converted to an agreed external format before transmission and converted to the local
form on receipt
• The values are transmitted in the sender’s format, together with an indication of the format used, and
the recipient converts the values if necessary.
• An agreed standard for the representation of data structures and primitive values is
called an external data representation.
3. • Marshalling is the process of taking a collection of data items and assembling them
into a form suitable for transmission in a message.
• Three alternative approaches to external data representation and marshalling
• CORBA’s common data representation (CDR)
• Java’s object serialization
• XML (Extensible Markup Language)
• Marshalling should be handled in software, not directly by the programmer.
• Because marshalling takes into account all the details of a composite object, it is prone to errors if
done manually.
• Design issue of marshalling : file format, type information
4. • CORBA CDR is the external data representation defined
with CORBA 2.0 [OMG 2004a].
• It consists of primitive type and constructed type.
Type Representation
sequence length(unsignedlong) followedby elements in order
string length(unsignedlong) followedby characters in order (can also
can have widecharacters)
array arrayelements in order (no length specified becauseit is fixed)
struct in theorder of declaration of thecomponents
enumerated unsignedlong(thevalues are specifiedby theorder declared)
union type tag followed by the selected member
5. The flattened form represents a Person struct with value: {‘Smith’, ‘London’, 1984}
0–3
4–7
8–11
12–15
16–19
20-23
24–27
5
"Smit"
"h___"
6
"Lond"
"on__"
1984
index in
sequence of bytes 4 bytes
notes
on representation
length of string
‘Smith’
length of string
‘London’
unsigned long
6. • Type of a data item not given: assumed sender and recipient have common
knowledge of the order and types of data items
• Types of data structures and types of basic data items are described in CORBA IDL
• Provides a notation for describing the types of arguments and results of RMI methods
7. • Both objects and primitive data values may be passed as arguments and results of
method invocations.
• The following Java class is equivalent to Person struct
8. • Serialization: flattening objects into a serial form for storing on disk or transmitting in
a message.
• Assumed has no prior knowledge of the types of the objects in the serialized form
• Some information about the class of each object is included in the serialized form
• Java objects can contain references to other objects.
• All objects it references are serialized
• References are serialized as handles
• A handle is a reference to an object within the serialized form
• Each object is written once only
• Handle is written in subsequent occurrences
9. 1. its class info is written out: name, version number
2. types and names of instance variables
* If an instance variable belong to a new class, then new class info must be
written out, recursively.
* Each class is given a handle
3. values of instance variables
Ex) Person p = new Person("Smith", "London", 1984);
The true serialized form contains additional type markers; h0 and h1 are handles
Serialized values
Person
3
1984
8-byte version number
int year
5 Smith
java.lang.String
name:
6 London
h0
java.lang.String
place:
h1
Explanation
class name, version number
number, type and name of
instance variables
values of instance variables
10. • XML is a markup language that was defined by the World Wide Web Consortium
(W3C) for general use on the Web.
• XML data items are tagged with ‘markup’ strings.
• XML is used to enable clients to communicate with web services and for defining the interfaces and
other properties of web services.
• XML is extensible in the sense that users can define their own tags
• XML documents, being textual, can be read by humans.
11. • XML elements and attributes
• Elements: An element in XML consists of a portion of character data surrounded by matching start and
end tags.
• Attributes: A start tag may optionally include pairs of associated attribute names and values such as
id="123456789", as shown above.
• Parsing and well-formed documents
• Well formed rule
• every start tag has a matching end tag
• all tags are correctly nested
• Every XML document must have a single root element
• CDATA : How to represent special characters
• XML prolog : version, encoding
12. • XML namespaces : a set of names for a collection of element types and attributes
• referenced by a URL
• allows an application to make use of multiple sets of external definitions in different namespaces
without the risk of name clashes.
13. • XML schemas : Define element shape
• defines the elements and attributes that can appear in a document
• how the elements are nested and the order and number of elements, and whether an element is
empty or can include text
14. • Each process contains objects, some of which can receive remote invocations, others
only local invocations.
• Those that can receive remote invocations are called remote objects
• Objects need to know the remote object reference of an object in another process in
order to invoke its methods.
• A remote object reference is an identifier for a remote object that is valid throughout
a distributed system.
15. • Remote object references must be generated in a manner that ensures uniqueness
over space and time.
• Even if remote object is deleted, it is important that the remote object reference is not reused
• Example of unique remote object reference
• Concatenate Internet address of its computer and the port number of the process that created it with
the time of its creation and a local object number
16. • Multicast communication is appropriate model for communication from one process
to a group of other processes.
• Four cases where multicast messages are used
• Fault tolerance based on replicated services
• Discovering services in spontaneous networking
• Better performance through replicated data
• Propagation of event notifications
17. • IP multicast
• built on top of the Internet Protocol
• allows the sender to transmit a single IP packet to a set of computers that form a multicast group.
• multicast group is specified by a Class D Internet address
• following details are specific to IPv4
• Multicast routers
• Multicast address allocation
18. • Multicast Routers
• IP packets can be multicast both on a local network and on the wider Internet.
• Local multicasts use the multicast capability of the local network, for example, of an Ethernet.
• Internet multicasts make use of multicast routers, which forward single datagrams to routers on other
networks, where they are again multicast to local members.
• To limit the distance of propagation of a multicast datagram, the sender can specify the number of
routers it is allowed to pass – called the time to live, or TTL for short.
19. • Multicast address allocation
• Class D addresses (that is, addresses in the range 224.0.0.0 to 239.255.255.255) are reserved for
multicast traffic
• Local network Control Block, Internet Control Block, Ad Hoc Control Block, Administratively Scoped
Block
• Multicast addresses may be permanent or temporary
• When a temporary group is created, it requires a free multicast address to avoid
accidental participation in an existing group.
• The IP multicast protocol does not directly address this issue.
• If used locally, setting the TTL to a small value, making collisions with other groups unlikely.
20. • Failure model for multicast datagrams
• Datagrams multicast over IP multicast have the same failure characteristics as UDP datagrams
• Unreliable multicast, because it does not guarantee that a message will be delivered to any member of
a group.
21.
22. • Failure feature of Unreliable IP Multicast
• A datagram sent from one multicast router to another may be lost
• recipients may drop the message because its buffer is full.
• the effect of the failure semantics of IP multicast on the four examples
• Fault tolerance based on replicated services
• Discovering services in spontaneous networking
• Better performance through replicated data
• Propagation of event notifications
• The examples also suggest that some applications have strong requirements for
ordering, the strictest of which is called totally ordered multicast,