Distributed Operating System and
Department Of I.T,CE-Poonjar
I. Presents users (and applications) with an integrated
computing platform that hides the individual computers.
II. Has control over all of the nodes (computers) in the network
and allocates their resources to tasks without user
III. the user doesn't know (or care) where his programs are
Ex:Cluster computer systems,V system, Sprite
Middleware implements abstractions that support network-wide
RPC and RMI (Sun RPC, Corba, Java RMI)
event distribution and filtering (Corba Event Notification, Elvin)
resource discovery for mobile and ubiquitous computing
support for multimedia streaming
Functions that OS should provide for middleware
provide a set of operations that meet their clients’ needs
protect resource from illegitimate access
3. Concurrent processing
support clients access resource concurrently
4. Invocation mechanism: a means of accessing an encapsulated
Communication : Pass operation parameters and results between
Scheduling: Schedule the processing of the invoked operation
Core OS functionality
Thread manager Memory manager
Handles the creation of and operations upon processes.
Thread creation, synchronization and scheduling
Communication between threads attached to different
processes on the same computer
Management of physical and virtual memory
Dispatching of interrupts, system call traps and other exceptions
control of memory management unit and hardware caches
processor and floating point unit register manipulations
Kernel and Protection
complete access privileges for the physical resources
Different execution mode
supervisor mode (kernel process) / user mode (user process)
system call trap: invocation mechanism for resources managed by
An address space: a collection of ranges of virtual memory locations,
in each of which a specified combination of memory access rights
applies, e.g.: read only or read-write
The cost for protection
switching between different processes take many processor cycles
a system call trap is a more expensive operation than a simple method
Processes and Threads
A program in execution
Problem: sharing between related activities are awkward and
a process consists of an execution environment together with
one or more threads.
An execution environment is a collection of local kernel managed
resources to which its threads have access. An execution
environment primarily consists of:
an address space;
thread synchronization and communication resources such as
semaphores and communication interfaces (for example, sockets);
higher-level resources such as open files and windows.
Process address space
a unit of management of a process’s
Up to 232 bytes and sometimes up to
consists of one or more regions.
an area of continuous virtual memory
that is accessible by the threads of the
can be shared
1. kernel code
3. shared data & communication
4. Copy on write
Creation of new process in distributed system
Creating process by the operation system
Fork, in UNIX
Process creation in distributed system
The choice of a target host
Choice of process host
running new processes at their originator’s computer
sharing processing load between a set of computers
Load sharing system
Centralized, Decentralized, Hierarchical
Load sharing policy
Transfer policy: situate a new process locally or remotely
Location policy: which node should host the new process
1. Static policy without regard to the current state of the system
2.Adaptive policy applies heuristics to make their allocation
Creation of a new execution environment
1. Initializing the address space
Statically defined format
With respect to an existing execution environment,
2. Copy-on-write scheme
A technique that is used to reduce amount of data copy.
Copy-on-write – a convenient optimization
a) Before write b) After write
Process A’s address space Process B’s address space
Threads concept and implementation
threads were introduced as a lightweight – operating system
a process consists of its address-space, and a set of threads attached
to that process.
The operating system can perform less expensive context switches
between threads attached to the same process and threads attached to
the same process can access the same memory etc, such that
communication/ synchronisation can be much cheaper and less
A server application generally
A single thread, the receiver-
thread which receives all the
requests, places them in a queue
and dispatches those requests to
be dealt with by the worker-
The worker-thread which deals
with the request may be a thread
in the same process or it may be
a thread in another
Client and server with threads Worker pool
Server creates a fixed pool of
“worker” threads to process the
requests when it starts up.
The module marked ‘receipt
and queuing is typically
implemented by an ‘I/O’ thread,
1. Inflexibility: worker threads
number unequal to current
2. High level of switching
between the I/O and worker
For single thread, server can handle 100 client
requests per second
Alternative server threading architectures
Server spawn a new worker thread for each new request, destroy it
when the request processing finish.
Pro: throughput is potentially maximized.
Con: overhead of the thread creation and destruction
Server creates a new worker thread when client creates a connection,
destroys the thread when the client closes the connection.
Pro: lower thread management overheads compared with the thread-
Con: client may be delayed while a worker thread has several
outstanding requests but another thread has no work to perform
Associate a thread with each remote object
Pro&Con are similar to thread-per-connection
Threads versus multiple processes
Creating a thread is (much) cheaper than a process (~10-20times)
Switching to a different thread in same process is (much) cheaper
Threads within same process can share data and other resources
more conveniently and efficiently (without copying or messages)
Threads within a process are not protected from each other
can be implemented:
in the OS kernel (Win NT, Solaris, Mach)
at user level e.g. by a thread library, or in the language (Ada, Java).
Java thread constructor and management methods
• Thread(ThreadGroup group, Runnable target, String name)
Creates a new thread in the SUSPENDED state, which will
belong to group and be identified as name; the thread will
execute the run() method of target.
• setPriority(int newPriority), getPriority()
Set and return the thread’s priority.
A thread executes the run() method of its target object, if it has
one, and otherwise its own run() method (Thread implements
Methods of objects that inherit from class Thread
Change the state of the thread from SUSPENDED to
• sleep(int millisecs)
Cause the thread to enter the SUSPENDED state for the
Enter the READY state and invoke the scheduler.
Destroy the thread.
Java thread synchronization calls
• thread.join(int millisecs)
Blocks the calling thread for up to the specified time until thread
Interrupts thread: causes it to return from a blocking method call
such as sleep().
• object.wait(long millisecs, int nanosecs)
Blocks the calling thread until a call made to notify() or
notifyAll() on object wakes the thread, or the thread is interrupted,
or the specified time has elapsed.
• object.notify(), object.notifyAll()
Wakes, respectively, one or all of any threads that have called
wait() on object. Deepak John,CE-Poonjar
Operating system architecture
Monolithic Kernel Microkernel
Server: Dynamically loaded server program:Kernel code and data:
S1 S2 S3
S2 S3 S4
Monolithic kernel and microkernel
Monolithic vs Microkernel
A monolithic kernel provides all of the services via a single
image, that is a single program initialized when the computer boots
A microkernel instead implements only the absolute minimum:
Basic virtual memory, Basic scheduling and Inter-process
All other services such as device drivers, the file system,
networking etc are implemented as user-level server processes that
communicate with each other and the kernel via IPC
The Microkernel Approach
The major advantages of the microkernel approach include:
Extensibility - major functionality can be added without modifying
the core kernel of the operating system
Modularity - the different functions of the operating system can be
forced into modularity behind memory protection barriers. A
monolithic kernel must use programming language features or code
conventions to attempt to ensure this
Robustness -relatively small kernel might be likely to contain fewer
bugs than a larger program, however, this point is rather contentious
Portability - since only a small portion of the operating system, its
smaller kernel, relies on the particulars of a given machine it is easier
to port to a new machine architecture
The role of the microkernel
The microkernel supports middleware via subsystems
The Monolithic Approach
The major advantage of the monolithic approach is the relative
efficiency with which operations may be invoked. Since services
share an address space with the core of the kernel they need not make
system calls to access core-kernel functionality
Most operating systems in use today are a kind of hybrid solution
Linux is a monolithic kernel, but modules may be dynamically
loaded and unloaded at run time.
Distributed File Systems
Purposes of a Distributed File System
Sharing of storage and information across a network
Convenience (and efficiency) of a conventional file system
Persistent storage that most other services (e.g., Web servers) need
Files are an abstraction of permanent storage.
A file is typically defined as a sequence of similar-sized data items
along with a set of attributes.
A directory is a file that provides a mapping from text names to
internal file identifiers.
Access control list
E.g. for UNIX: rw-rw-r--
Responsible for the (a) organization, (b)storage, (c) retrieval, (d)
naming, (e)sharing, and (f) protection of files.
Provide a set of programming operations that characterize the file
abstraction,particularly operations to read and write subsequences of
data items beginning at any point of a file.
Directory module: relates file names to file IDs
File module: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering
UNIX file system operations
Distributed File System Requirements
File Service Architecture
An architecture that offers a clear separation of the main concerns
in providing access to files is obtained by structuring the file
service as three components:
A flat file service
A directory service
A client module.
The Client module implements exported interfaces by flat file and
directory services on server side.
Client computer Server computer
Flat file service
Flat file service:
Concerned with the implementation of operations on the contents of file.
Unique File Identifiers (UFIDs) are used to refer to files in all requests
for flat file service operations.
Provides mapping between text names for the files and their UFIDs.
Clients may obtain the UFID of a file by quoting its text name to
directory service. Directory service supports functions needed generate
directories, to add new files to directories.
It runs on each computer and provides integrated service (flat file and
directory) as a single API to application programs.
It holds information about the network locations of flat-file and directory
server processes; and achieve better performance through
implementation of a cache of recently used file blocks at the client.
Flat file service operations
Read(FileId, i, n) -> Data
— throws BadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items
from a file starting at item i and returns it in Data.
Write(FileId, i, Data)
— throws BadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a
file, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId) -> Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes
UNIX checks access rights when a file is opened ,subsequent
checks during read/write are not necessary
a. server has to check.
b. stateless approaches
1. access check once when UFID is issued
• client gets an encoded "capability" (who can access and how)
• capability is submitted with each subsequent request
2. access check for each request.
A collection of files that can be located
on any server or moved between
servers while maintaining the same
Similar to a UNIX filesystem
Helps with distributing the load of
file serving between several servers.
File groups have identifiers which
are unique throughout the system
(and hence for an open system, they
must be globally unique).
Used to refer to file groups and
To construct a globally
unique ID we use some
unique attribute of the
machine on which it is
created, e.g. IP number,
even though the file group
may move subsequently.
IP address date
32 bits 16 bits
File Group ID:
Case Study: Sun NFS
An industry standard for file sharing on local networks since the
An open standard with clear and simple interfaces.
udp or tcp
Supports many of the design requirements already mentioned:
Limited achievement of:
Client computer Server computer
Virtual file systemVirtual file system
on local files
Virtual file system
part of unix kernel
NFS file handle, 3 components:
different groups of files
i-node (index node)
structure for finding the file
i-node generation number
i-nodes are reused
incremented when reused
struct for each file system
v-node for each open file
file handle for remote file
i-node number for local file
NFS access control and authentication
Stateless server, so the user's identity and access rights must be
checked by the server on each request.
In the local file system they are checked only on open()
Every client request is accompanied by the userID and groupID
Server is exposed to imposter attacks unless the userID and groupID
are protected by encryption
Kerberos has been integrated with NFS to provide a stronger and
more comprehensive security solution
the process of including a new file system is called mounting.
communicates with the mount process on the server in a mount
Server maintains a table of clients who have mounted filesystems at
Each client maintains a table of mounted file systems holding.
user process is suspended until request is successful
when server is not responding
request is retried until it's satisfied
if server fails, client returns failure after a small number of retries
user process handles the failure Deepak John,CE-Poonjar
THE MOUNT PROTOCOL:
The following operations occur:
1. The client's request is sent via RPC to the mount server ( on
2. Mount server checks export list containing
a) file systems that can be exported,
b) legal requesting clients.
c) It's legitimate to mount any directory within the legal
3. Server returns "file handle" to client.
4. Server maintains list of clients and mounted directories -- this is
state information! But this data is only a "hint" and isn't treated as
5. Mounting often occurs automatically when client or server boots.
Local and remote file systems
jim jane joeann
Client Server 2
. . . nfs
. . .
Note: The file system mounted at /usr/students in the client is actually the sub-tree
located at /export/people in Server 1; the file system mounted at /usr/staff in the client is
actually the sub-tree located at /nfs/users in Server 2.
server doesn't receive the entire pathname for translation,
client breaks down the pathnames into parts
iteratively translate each part
translation is cached
NFS client catches attempts to access 'empty' mount points and routes
them to the Automounter
Automounter has a table of mount points and multiple candidate
serves for each
it sends a probe message to each candidate server and then uses the
mount service to mount the filesystem at the first server to respond
Keeps the mount table small
caching file pages, directory/file attributes
read-ahead: prefetch pages following the most-recently read file
delayed-write: write to disk when the page in memory is needed
for other purposes
"sync" flushes "dirty" pages to disk every 30 seconds
two write option
1. write-through: write to disk before replying to the client
2. cache and commit:
stored in memory cache
write to disk before replying to a "commit" request from the
client Deepak John,CE-Poonjar
caches results of read, write, getattr, lookup, readdir
clients responsibility to poll the server for consistency
timestamp-based methods for consistency validation
Tc: time when the cache entry was last validated
Tm: time when the block was last modified at the server
cache entry is valid if:
1. T - Tc < t, where t is the freshness interval
t is adaptively adjusted:
files: 3 to 30 seconds depending on freq of updates
directories: 30 to 60 seconds
2. Tmclient = Tmserver
need validation for all cache accesses
dirty: modified page in cache
flush to disk: file is closed or sync from client
bio-daemon (block input-output)
read-ahead: after each read request, request the next file block from
the server as well
delayed write: after a block is filled, it's sent to the server
reduce the time to wait for read/write
Summary for NFS
access transparency: same system calls for local or remote files
location transparency: could have a single name space for all files
(depending on all the clients to agree the same name space)
mobility transparency: mount table need to be updated on each client
scalability: can usually support large loads, add processors, disks,
file replication: read-only replication, no support for replication of
files with updates
hardware and OS: many ports
fault tolerance: stateless and idempotent
consistency: not quite one-copy for efficiency
security: added encryption--Kerberos
efficiency: pretty efficient, wide-spread use
Device independent address (e.g., you can move domain
name/web site from one server to another server seamlessly).
In a Distributed System, a Naming Service is a specific service
whose aim is to provide a consistent and uniform naming of
resources, this allowing other programs or services to localize
them and obtain the required metadata for interacting with
Requirements for name spaces
Allow simple but meaningful names to be used
Potentially infinite number of names
to allow similar subnames without clashes
to group related names
Management of trust
A client iteratively contacts name servers NS1–NS3 in order to resolve a name
Iterative Navigation is the act of chaining multiple Naming Services in
order to resolve a single name to the corresponding resource.
Iterative Navigation is the act of chaining multiple Naming Services in
order to resolve a single name to the corresponding resource.
The Iterative Navigation can be…
it is performed by the naming server
the server becomes like a client for the next server
it is performed by the client or the first server
if it is performed by the server it is “server controlled”
the server bounces back the next hop to its client
Non-recursive and recursive server-controlled navigation
A name server NS1 communicates with other name servers on behalf of a client
DNS offers recursive navigation as an option, but iterative is the
standard technique. Recursive navigation must be used in domains
that limit client access to their DNS information for security reasons.
The SNS - a Simple Name Service model
Stores attributes of named objects such as users, computers and
services and group names.
Email server, login info, encoded passwords,
Network addresses, architecture, OS, owner
Named object Value
Service address, version no.
Group Mailing lists, group1, group2,...
SNS basic design requirements
Specify the Types of named objects:
users, services, computers and group names and directories.
Other types of objects may be integrated;
The names are used only within the organization;
Efficient name lookup;
everyone can read but Authorized write;
DNS - The Internet Domain Name System
A distributed naming database (specified in RFC 1034/1305)
Name structure reflects administrative structure of the Internet
Rapidly resolves domain names to IP addresses
exploits caching heavily
typical query time ~100 milliseconds
Scales to millions of computers
Basic DNS algorithm for name resolution (domain name -> IP number)
• Look for the name in the local cache
• Try a superior DNS server, which responds with:
– another recommended DNS server
– the IP address (which may not be entirely up to date)
DNS server functions and configuration
Main function is to resolve domain names for computers, i.e. to get
their IP addresses.
reverse resolution - get domain name from IP address
Host information - type of hardware and OS
Well-known services - a list of well-known services offered by a
Other attributes can be included (optional)
Name tables change infrequently, but when they do, caching can
result in the delivery of stale data.
Clients are responsible for detecting this and recovering
Its design makes changes to the structure of the name space
difficult. For example:
merging previously separate domain trees under a new root
moving subtrees to a different part of the structure (e.g. if
Scotland became a separate country, its domains should all be
moved to a new country-level domain.)