2. Distributed file system
A file system is a subsystem of an operating
system that performs file management activities
such as organization, storing, retrieval, naming,
sharing, and protectionof files.
It is designed to allow programs to use a set of
operations that characterize the file abstraction
and free the programmers from concerns about
the details of space allocation and layout of the
secondary
3. A distributive file system provides similar
abstraction to the users of a distributive system
and makes it convenient for them to use files in a
distributed environment.
Remote information sharing
User mobility
Availability
Diskless workstations
4. A distributed file system typically provides the
following three types of services:
1) Storage Service
2) True file service
3)Name service
5. Desirable features
Transparency
User mobility
Performance
Simplicity and ease of use
Scalability
High availability
High reliability
Data integrity
Security
Heterogenity
6. Transparency: The following four types of
transparencies are available:
1) Structure transparency
2) Access transparency
3)Naming transparency
4) Replication transparency
User mobility: In a distributed system, a user
should not be forced to work on a specific node
but should have the flexibility to work on
different nodes at different times.
Performance: The performance of a file system is
usually measured as the average amount of time
needed to satisfy client requests.
7. Simplicity and ease of use: Several issues
influence the simplicity and ease of use of a
distributed system.
Scalability: It is inevitable that a distributed
system will grow with time since expanding
the network by adding new machines or
interconnecting two networks together is
commonplace.
High availability: A distributed file system
should continue to function even when partial
failures occur due to the failure of one or more
components.
8. High reliability: In a good distributed file
system, the probability of loss of stored data
should be minimized as far as practicable.
Data integrity: A file is often shared by
multiple users.
Security: A distributed file system should be
secure so that its users can be confident of the
privacy of their data.
Heterogeneity: As a consequence of large
scale, heterogeneity becomes inevitable in
distributed systems.
9. File models
Different file systems use different conceptual
models of a file. The two most commonly used
criteria for file modeling are structure and
modifiability.
1) Unstructured and structured files
2) Mutable and immutable files
10. Unstructured and
structured files
According to the simplest model, a file is an
unstructured sequence of data, In this model, there
is no substructure known to the file server and the
contents of each file of the file system appears to
the file server as an uninterpreted sequence of
bytes.
The operating system is not interested in the
information stored in the files. Hence, the
interpretation of the meaning and structure of the
data stored in the files are entirely up to the
application programs. UNIX and MS-DOS use
this file model.
11. Another file model that is rarely used nowadays is
the structured file model. In this model, a file
appears to the file server as an ordered sequence
of records .
Records of different files of the same file system
can be of different size.
Therefore, many types of files exists in a file
system, each having different properties.
In this model, a record is the smallest unit of the
file data that can be accessed, and the file system
read or write operations are carried out on a set of
records.
12. Structured files are again of two types- files with
nonindexed records and files with indexed
records.
In the former model, a file record is accessed by
specifying its position within the file, for
example, the fifth record from the beginning of
the file or the second record from the end of the
file.
In the later model, records have one or more key
fields and can be addressed by specifying the
values of the key fields.
13. Mutable and immutable files
According to the modifiablity criteria, files are of
two types- mutable and immutable.’
Most existing operating systems use the mutable
file model.
In this model, an update performed on a file
overwrites on its old contents to produce the new
contents.
That is, a file is represented as a single stored
sequence that is altered by each update operation.
14. On the other hand, some more recent file systems,
such as the Cedar File System use the immutable
file model.
In this model, file cannot be modified once it has
been created except to be deleted.
The file versioning approach is normally used to
implement file updates, and each file is
represented by a history of immutable versions.
That is, rather than updating the same file, a new
version is retained unchanged.
15. File-Accessing files
The manner in which a client’s request to access a
file is serviced depends on the file-accessing
model used by the file system.
The file-accessing model of a distributed file
system mainly depends on two factors: the
method used for accessing remote files and the
unit of data access.
16. Accessing Remote Files
A distributed file system may use one of the
following models to service a client’s file access
request when the accessed file is a remote file:
1) Remote service model
2) Data-caching model
17. 1) Remote service model: In this model, the
processing of the client’s request is performed at
the server’s node. That is the client’s request for
file access is delivered to the server, the server
machine performs the access request and finally
the result is forwarded back to the client.
2) Data-caching model: In the remote sevice
model, every remote file access request results in
network traffic. The data-caching model
attempts to reduce the amount of network traffic
by taking advantage of the locality feature found
in file access.
18. File-sharing Semantics
The strongest feature of DFS is that, in spite of
using the data-caching model, it supports the
single-site UNIX semantics.
That is, every ready operation on a file sees the
effects of all previous write operations performed
on that file. This achieved in the manner
described below.
Recall that each DFS server has a component
called token manger. The job of the token manger
is to issue tokens to clients for file access requests
and to keep track of which clients have been
19. To maximize performance, better concurrency is
achieved by using the following techniques:
1) Type-specific tokens
2) Fine-grained tokens
20. Every token has an expiration time of 2 minutes.
Therefore, if a client does not respond to a token
revocation message from a server, the server just
waits for 2 minutes and then acts as if the token
has been returned by the client.
Another important difference between NFS and
DFS that is worth mentioning here is that NFS
allows every machine to be both a client and a
server at the same time. That is, any client may
also be a server and any server may also be a
client.
21. File-caching scheme in DFS
DFS recently accessed file data are cached by the
cache manager of client machines.
The local disk of a client machine is used for this
purpose.
However, in a diskless client machine, the local
memory is used for caching file data.
The same is true for cache validation scheme.
That is, a cached file data of a token revocation
message for the file data from the file server.
22. Replication
Distributed File Service provides the facility to
replicate files on multiple file serves. The unit of
replicaton is a fileset. That is, all files of a fileset
are replicated together.
The existence of multiple replicas of a file is
transparent to normal users. That is, a filename is
mapped to all file servers having a replica of the
file. Therefore, given a filename is mapped to all
file servers having a replica of the file.
23. Fault tolerance
In addition to allowing replication of filesets,
another important feature of DFS that help in
improving is fault ability is the use of the write-
ahead log approach for recording file updates in a
recoverable manner.
In DFS, for every update made to a file, a log is
written to the disk.
A log entry contains the old value kand the new
value of the modified part of the file.
24. When the system comes up after a crash, the log
is used to check which changes have already baen
made to the file and which changes have not yet
been made.
Those that have not been made are the ones that
were lost due to system crash.
These changes are now made to the file to bring it
to a consistent state.
25. Atomic transaction
The DCE does not provide transaction processing
facility either as a part of DFS or as an
independent component.
This is mainly because DCE currently does not
possess services needed for developing and
running mission critical, distributed on-line
transaction processing(OLTP) applications.
For instance, OLTP applications require
guaranteed data integrity, application
programming interface with simplified transaction
semantics, and the ability to extend programs to
support RPCs that allow multiple.
26. In addition to Encina, two other transaction
processing environments gaining popularity are
Customer Information Control System(CICS) and
Information Management System(IMS), both
from IBM.
CICS is already being used by more than 20,000
customers in more than 90 countries worldwide.
Encina offers interoperability with IBM’s CICS.
CICS can be implemented on top of the DCE and
Encina technology.
27. Design Principles
Based on distributed file systems, Satyanarayanan
has stated the following general principles for
designing distributed file systems:
1) Clients have cycles to burn
2) Cache whenever possible.
3) Exploit usage properties
4) Minimize systemwide knowledge and
change
5) Trust the fewest possible entities
6) Batch if possible