SlideShare a Scribd company logo
1 of 107
Distributed File Systems
Presented By
Dr. A. ASHOK KUMAR
Assistant Professor,
Department Of Computer Science,
Alagappa Government Arts College,
Karaikudi – 630003.
ashokamjuno@rediffmail.com
INTRODUCTION
 A file is a named object
 The two main purposes of using files are as
follows:
1. Permanent storage of information - storing a file
on a secondary storage media
2. Sharing of information - a file can be created by
one application and then shared with different
applications
 A file system is a subsystem of an operating
system that performs file management
activities
 organization, storing, retrieval, naming,
sharing, and protection of files
 A distributed file system provides the users of
a distributed system to use files in a
distributed environment
 The design and implementation is more
complex than a conventional file system
 A distributed file system supports the
following:
1. Remote information sharing
 a file to be transparently accessed by processes of any
node of the system irrespective of the file‘s location
2. User mobility
 a user should not be forced to work on a specific node but
should have the flexibility to work on different nodes at
different times
 This property is desirable due to: node failures, work at
different places
3. Availability
 For better fault tolerance, files should be available for use even
in the event of temporary failure of one and more nodes of the
system
 distributed file system keeps multiple copies of a file on
different nodes of the system
 Each copy is called a replica of the file
4. Diskless workstations
 A distributed file system, with its transparent remote file-
accessing capability, allows the use of diskless workstations in
a system
 A distributed file system typically provides the
following three types of services
◦ Storage service
 allocation and management of space on a secondary storage
device
 the storage service is also known as disk service
 allocate disk space in units of fixed-size blocks,
 the storage service is also known as block service in these
systems
◦ True file service
 It is concerned with the operations on individual files
 operations for accessing and modifying the data in files and for
creating and deleting files
 typical design issues of a true file service component
 file-accessing mechanism,
 file-sharing semantics,
 file-caching mechanism,
 file replication mechanism,
 concurrency control mechanism,
 data consistency and multiple copy update protocol,
 access control mechanism
◦ Name service
 mapping between text names for files and references to files,
that is, file lD’s
 file systems use directories to perform this mapping
 the name service is also known as a directory service
 creation and deletion of directories,
 adding a new file to a directory,
 deleting a file from a directory,
 changing the name of a file,
 moving a file from one directory to another
DESIRABLE FEATURES OF A GOOD DISTRIBUTED FILE SYSTEM
1. Transparency
◦ Structure transparency
 a distributed file system uses multiple file servers
 the multiplicity of file servers should be transparent to the
clients of a distributed file system
◦ Access transparency
 the file system interface should not distinguish between
local and remote files
◦ Naming transparency
 The name of a file should give no hint as to where the file
is located
◦ Replication transparency
 the existence of multiple copies and their locations should
be hidden from the clients
2. User mobility
◦ a user should have the flexibility to work on different nodes
at different times
3. Performance
◦ The performance of a file system is measured as the average
amount of time needed to satisfy client requests
◦ In centralized file systems,
 this time includes the time for accessing the secondary storage device
on which the file is stored and the CPU processing time
◦ In a distributed file system,
 this time also includes network communication overhead when the
accessed file is remote
4. Simplicity and ease of use
◦ the user interface to the file system must be simple and the
number of commands should be as small as possible
◦ the semantics of a distributed file system should be the
same as a conventional centralized time-sharing system
5. Scalability
◦ a good distributed file system should be designed to
easily cope with the growth of nodes and users in the
system
◦ A scalable design should withstand high service load,
accommodate growth of the user community, and
enable simple integration of added resources
6. High availability
◦ the file system may show degradation in
performance, functionality
◦ Replication of files at multiple servers is the primary
mechanism for providing high availability
7. High reliability
◦ the probability of loss of stored data should be
minimized
◦ The file system should automatically generate backup
copies of critical files
8. Data integrity
◦ A file is often shared by multiple users
◦ the file system must guarantee the integrity of data
stored in it
◦ concurrent access requests from multiple users
properly synchronized
◦ Atomic transactions are a high-level concurrency
control mechanism
9. Security
◦ protect information stored in a file system against
unauthorized access
10. Heterogeneity
◦ flexibility to their users to use different computer
platforms for different applications
◦ allows a variety of workstations to participate in the
sharing of files via the distributed file system
◦ File systems is the ability to accommodate several
different storage media
FILE MODELS
 The two criteria for file modeling are structure
and modifiability
 Unstructured and Structured Files
◦ a file is an unstructured sequence of data
 no substructure known to the file server
 the contents of each file of the file system appears to the file
server as an uninterpreted sequence of bytes
 the interpretation of
 the meaning and structure of the data stored in the files are
entirely up to the application programs
◦ structured file mode
 a file appears to the file server as an ordered sequence of
records
 Records of different files of the same file system can be of
different size
 a record is the smallest unit of file data that can be accessed
 the file system read or write operations are carried out on a
set of records
 Structured files are again of two type
◦ nonindexed records
 a file record is accessed by specifying its position within the file
◦ indexed records
 records have one or more key fields and can be addressed by
specifying the values of the key fields
 modern operating systems use the unstructured file
model
 different applications can interpret the contents of a
file in different ways
 files also normally have attributes
 A file's attributes are information describing that file
 Each attribute has a name and a value
◦ Attributes of a file: owner, size, access permissions, date of
creation, date of last modification , and date of last access
 File attributes are normally maintained and used by
the directory service
Mutable And Immutable Files
 Most existing operating systems use the mutable
file model
◦ an update performed on a file overwrites on its old
contents to produce the new contents
 immutable file model
◦ a file cannot be modified once it has been created
except to be deleted
◦ The file versioning approach is normally used to
implement file updates
◦ a new version of the file is created each time a change
is made to the file contents
◦ the immutable file model suffers from two potential
problems
 increased use of disk space
 increased disk allocation activity
FILE-ACCESSING MODELS
 The file-accessing model of a distributed file system mainly depends on
two factors:
◦ the method used for accessing remote files
◦ the unit of data access
 Accessing Remote Files
 Remote service model
◦ the processing of the client's request is performed at the server's node
◦ the file server interface and the communication protocols must be designed
carefully to minimize the overhead of generating messages
 Data-caching model
◦ Used to reduce the amount of network traffic by taking advantage of the
locality feature found in file accesses
◦ if the data needed to satisfy the client's access request is not present locally
◦ it is copied from the server's node to the client's node and is cached there
◦ The client's request is processed on the client's node itself by using the
cached data
◦ A replacement policy LRU is used to keep the cache size bounded
◦ to modifying the locally cached copy of the data the changes be made in the
original file at the server node and other caches having the data
 The problem of keeping the cached data consistent with the original file
content is referred to as the cache consistency problem
Unit of Data Transfer
 Unit of data transfer refers to the fraction (or its multiples) of a
file data that is transferred to and from clients as a result of a
single read or write operation
 The four commonly used data transfer models:
1. File-level transfer model
2. Block-level transfer model
3. Byte-level transfer model
4. Record-level transfer model
 File-level transfer model
◦ when an operation requires file data to be transferred across the
network between a client and a server, the whole file is moved
◦ this model has several advantages
 transmitting an entire file on request is more efficient than transmitting it page
by page – multiple request –avoid overhead
 it has better scalability because it requires fewer accesses to file servers -
reduced server load and network traffic
 disk access routines on the servers can be better optimized
 entire file is cached at a client's site, it becomes immune to server and network
failures
◦ Drawback is it requires storage space on the client's node for storing
all the required files
 Block-level transfer model
◦ file data transfers between a client and a server in
units of file blocks
◦ A file block is a contiguous portion of a file and is
usually fixed in length
◦ block size is equal to virtual memory page size
◦ this model is also called a page-level transfer model
◦ Advantage is it does not require client nodes to
have large storage space
◦ used in systems having diskless workstations
◦ when an entire file is to be accessed,
 multiple server requests are needed in this model,
 resulting in more network traffic
◦ and more network protocol overhead
 Byte-level transfer model
◦ file data transfers between a client and a server
take place in units of bytes
◦ provides maximum flexibility because
 it allows storage and retrieval of an arbitrary sequential
subrange of a file,
 specified by an offset within a file,
 and a length.
◦ Drawback is the difficulty in cache management
due to the variable-length data for different access
requests
 Record-level transfer model
◦ In this model file contents are structured in the
form of records
◦ file data transfers between a client and a server
take place in units of records
FILE-SHARING SEMANTICS
 File sharing semantics adopted by a file system
 UNIX semantics
◦ an absolute time ordering on all operations
◦ every read operation on a file sees the effects of all
previous write operations performed on that file
◦ implemented in file systems for single processor
systems
◦ it is easy to serialize all read/write requests
◦ due to network delays, client requests from different
nodes may arrive and get processed in different order
◦ distributed file systems normally implement a more
relaxed semantics of file sharing
 Session semantics
◦ A session is a series of file accesses made between the
open and close operations
◦ all changes made to a file during a session are initially
made visible only to the client process
◦ once the session is closed, the changes made to the file
are made visible to remote processes
◦ session semantics multiple clients are allowed to
perform both read and write accesses concurrently on
the same file
◦ each client maintains its own image of the file
◦ When a client closes its session, all other remote
clients who continue to use the file are actually using a
stale copy of the file
◦ session semantics should be used only with those file
systems that use the file level transfer model
 Immutable shared-files semantics
◦ It is based on the use of the immutable file model
◦ once the creator of a file declares it to be sharable, the file is
treated as immutable
◦ it cannot be modified any more
◦ Changes to the file are handled by creating a new updated
version of the file
 Transaction-like semantics
◦ based on the transaction mechanism
◦ A transaction is a set of operations enclosed in-between a
pair of begin_transaction- and end_transaction-like
operations
◦ partial modifications made to the shared data by a
transaction will not be visible to other concurrently
executing transactions until the transaction ends
◦ beginning and end of a transaction are implicit in the open
and close file operations
◦ transactions can involve only one file
FILE-CACHING SCHEMES
 File caching has been implemented in centralized time-sharing systems
to improve file I/O performance
 file caching is to retain recently accessed file data in main memory
 repeated accesses to the same information can be handled without
additional disk transfers
 file caching reduces disk transfers
 file-caching scheme for a distributed file system may also contribute to
its
◦ scalability and
◦ reliability
 file-caching scheme for a centralized file system has several key
decisions
◦ Granularity of cached data (large versus small),
◦ Cache size (large versus small, fixed versus dynamically changing),
◦ And the replacement policy
 file-caching scheme for a distributed file system address the following
key decisions
◦ Cache location
◦ Modification propagation
◦ Cache validation
Cache location
 Cache location refers to the place where the cached data is
stored
 there are three possible cache locations in a distributed file
system
◦ Server's main memory
◦ Client's disk
◦ Client's main memory
1. Server's main memory
 When no caching scheme is used, before a remote
client can access a file,
◦ the file must first be transferred from the server's disk to
the server‘s main memory
◦ and then across the network from the server's main
memory to the client‘s main memory
 the total cost involved is one disk access and one
network access
 A cache located in the server's main memory
eliminates the disk access cost on a cache hit
 The decision to locate the cache in the server's main
memory may be due to the following reasons
◦ It is easy to implement and is totally transparent to the
clients.
◦ It is easy to always keep the original file and cached data
consistent since both reside on the same node
 the cache in the server's main memory involves a
network access for each file access operation
2. Client's disk
 A cache located in a client's disk
 It eliminates network access cost but requires disk access
cost on a cache hit
 A cache on a disk has several advantages
◦ Reliability - cached data are lost in a crash
 the data is still there during recovery
 and there is no need to fetch it again from the server's node
◦ Large storage capacity
 compared to a main-memory cache, a disk cache has plenty of storage
space
 distributed file systems use the file-level data transfer
model in which a file is always cached in its entirety
 drawback of having cached d.ata on a client's disk
 Does not work if the system is to support diskless
workstations
 server's main-memory cache eliminates disk access but
requires network access on a cache hit.
 a client's disk cache eliminates network access but
requires disk access on a cache hit
3. Client's main memory
 A cache located in a client's main memory
 eliminates both network access cost and disk
access cost
 It permits workstations to be diskless
 A client's main-memory cache is not
preferable to a client's disk cache
 when large cache size and increased
reliability of cached data are desired
Modification Propagation
 Caches of all these nodes contain exactly the same
copies of the file data, the caches are consistent
 caches to become inconsistent when the file data is
changed by one of the clients and other nodes are not
changed or discarded
 file data cached at multiple client nodes must be
consistent
 To handle this issue have been proposed and
implemented
 the following cache design issues for distributed file
systems
1. When to propagate modifications made to a cached
data to the corresponding file server
2. How to verify the validity of cached data
 The modification propagation schemes are
◦ Write-through Scheme
◦ Delayed-Write Scheme
Write-through Scheme
 when a cache entry is modified,
◦ The new value is immediately sent to the server
◦ Server update the master copy of the file
 two main advantages
◦ high degree of reliability
◦ and suitability for UNIX-like semantics
 the risk of updated data getting lost (when a
client crashes) is very low
 drawback of this scheme is
◦ its poor write performance
◦ each write access has to wait until the information
is written to the master copy of the server
Delayed-Write Scheme
 the write-through scheme helps on reads
 to reduce network traffic for writes use the
delayed-write scheme
 when a cache entry is modified,
◦ The new value is written only to the cache
◦ and the client just makes a note that the cache entry
has been updated
◦ Some time later, all updated cache entries
corresponding to a file are gathered together
◦ and sent to the server at a time
 delayed-write policies are of different types
1. Write on ejection from cache
2. Periodic write
3. Write on close
 Write on ejection from cache
◦ modified data in a cache entry is sent to the server when
 the cache replacement policy has decided to eject it from the client‘s
cache
 Periodic write
◦ The cache is scanned periodically, at regular intervals, and
any cached data that have been modified since the last
scan are sent to the server
 Write on close
◦ The modifications made to a cached data by a client are
sent to the server when the corresponding file is closed by
the client
◦ The write-on-close policy is a perfect match for the
session semantics
◦ the close operation takes a long time because all modified
data must be written to the server before the operation
completes
Cache Validation Schemes
 It is necessary to verify if the data cached at a
client node is consistent with the master copy
 If not, the cached data must be invalidated and
the updated version of the data must be fetched
again from the server
 There are basically two approaches
1. Client-Initiated Approach
2. Server-Initiated Approach
1. Client-Initiated Approach
◦ A client contacts the server and checks whether its
locally cached data is consistent with the master copy
◦ The file-sharing semantics depends on the frequency
of the validity check
◦ One of the following approaches may be used
◦ Checking before every access.
 This approach defeats the main purpose of caching
 because the server has to be contacted on every access.
 But it is suitable for supporting UNIX-like semantics
◦ Periodic checking.
 In this method, a check is initiated every fixed interval of
time
 The main problem of this method is that it results in fuzzier
file-sharing semantics
 Because the data on which an access operation is performed
is timing dependent
◦ Check on file open
 In this method, a client's cache entry is validated only when
the client opens the corresponding file for use
 This method is suitable for supporting session semantics
 implementing session semantics in a distributed file system
is
 to use the file-level transfer model coupled with the write-on-close
modification propagation policy and the check-on-file-open cache
validation policy
2. Server-Initiated Approach
 In this method,
◦ a client informs the file server when opening a file, indicating
whether the file is being opened for reading, writing, or both
◦ The file server keeps a record of which client has which file
open and in what mode
◦ the server keeps monitoring the file usage modes being used by
different clients
◦ and reacts whenever it detects a potential for inconsistency.
◦ A potential for inconsistency occurs when two or more clients
try to open a file in conflicting modes
◦ For example,
 if a file is open for reading, other clients may be allowed to open it for
reading without any problem,
 But opening it for writing cannot be allowed
◦ a new client should not be allowed to open a file in any mode if
the file is already open for writing
◦ When a client closes a file,
 it sends an intimation to the server along with any modifications made
to the file.
 On receiving such an intimation, the server updates its record of which
client has which file open in what mode
◦ server-initiated approach has the following
problems
 It violates the traditional client-server model in which
servers simply respond to service request activities
initiated by clients
 It requires that file servers be stateful
 A check-on-open, client-initiated cache validation
approach must still be used along with the server-
initiated approach
FILE REPLICATION
 A replicated file is a file that has multiple copies
 each copy located on a separate file server
 Each copy of the set of copies that comprises a
replicated file is referred to as a replica of the
replicated file
 Difference between Replication and Caching
Replication Caching
A replica is associated with a server cached copy is normally associated with
a client
replica normally depends on availability
and performance requirements
cached copy is primarily dependent on
the locality in file access patterns
replica is more persistent, widely
known, secure, available, complete, and
accurate
cache is not persistent
periodic revalidation with respect to a
replica can a cached copy be useful
cached copy is contingent upon a
replica
Advantages of Replication
1. Increased availability
◦ the system remains operational and available to the
users despite failures
◦ By replicating critical data on servers with
independent failure modes, the probability that one
copy of the data will be accessible increases
◦ alternate copies of a replicated data can be used
when the primary copy is unavailable
2. Increased reliability
◦ Due to the presence of redundant information in the
system, recovery from catastrophic failures becomes
possible
3. Improved response time
◦ Replication also helps in improving response time
◦ it enables data to be accessed either locally or from a
node to which access time is lower than the primary
copy access time
4. Reduced network traffic
◦ If a file's replica is available with a file server that resides
on a client's node,
◦ the client's access requests can be serviced locally
5. Improved system throughput
◦ Replication also enables several clients' requests for
access to the same file to be serviced in parallel by
different servers
6. Better scalability
◦ By replicating the file on multiple servers,
◦ the same requests can now be serviced more efficiently
by multiple servers due to workload distribution
7. Autonomous operation
◦ distributed system that provides file replication as a
service to their clients
◦ all files required by a client for operation during a limited
time period may be replicated on the file server
Replication Transparency
 A replicated file service must function
exactly like a non replicated file service
 replication of files should be designed to be
transparent to the users
 multiple copies of a replicated file appear as
a single logical file to its users
 the read, write, and other file operations
should have the same client interface
 Two important issues related to replication
transparency are
◦ naming of replicas
◦ and replication control
Naming of Replicas
 the replication transparency requirement calls for
the assignment of a single identifier to all replicas of
an object
 Assignment of a single identifier to all replicas of an
object are immutable objects
 there is only one logical object with a given identifier
 In mutable objects, different copies of a replicated
object may not be the same (consistent) at a
particular instance of time
 if all replicas are consistent, the mapping must
provide the locations of all replicas
 and a mechanism to identify the relative distances of
the replicas from the user's node
 For accessing mutable objects accessing the replica
is
◦ the responsibility of the naming system to map a user-
supplied identifier into the appropriate replica of a
mutable object.
Replication Control
 Replication control includes determining the
number and locations of replicas of a replicated file
 the replication control is handled entirely
automatically, in a user-transparent manner
 under certain circumstances, it is desirable to expose
these details to users and to provide them with the
flexibility to control the replication process
 if replication facility is provided to support
autonomous operation of workstations,
 users should be provided with the flexibility to
create a replica of the desired files on their local
nodes
 Depending on whether replication control is user
transparent or not, the replication process is of two
types
1. Explicit replication
2. Implicit/lazy replication
1. Explicit replication
◦ users are given the flexibility to control the entire replication process
◦ when a process creates a file, it specifies the server on which the file
should be placed
◦ if desired, additional copies of the file can be created on other servers
on explicit request by the users
◦ Users also have the flexibility to delete one or more replicas of a
replicated file
2. Implicit/lazy replication
◦ the entire replication process is automatically controlled by the
system without users' knowledge
◦ when a process creates a file, it does not provide any information
about its location
◦ The system automatically selects one server for the placement of the
file
◦ the system automatically creates replicas of the file on other servers,
based on some replication policy used by the system
◦ The system automatically delete any extra copies when they are no
longer needed
◦ Lazy replication is normally performed in the background when the
server has some free time
Multicopy Update Problem
 maintaining consistency among copies
 when a replicated file is updated is the major
design issue of a file system that supports
replication of files
 commonly used approaches to handle this
issue are
◦ Read-Only Replication
◦ Read-Any-Write-All Protocol
◦ Available-Copies Protocol
◦ Primary-Copy Protocol
◦ Quorum-Based Protocols
 Read-Only Replication
◦ allows the replication of only immutable files
◦ immutable files are used only in the read-only mode
◦ because mutable files cannot be replicated,
◦ The multicopy update problem does not arise
◦ Files known to be frequently read and modified only once
in a while
 Read-Any-Write-All Protocol
◦ replication scheme that can support the replication of
mutable files
◦ replication scheme support the replication of mutable files
is the read-any-write-all protocol
◦ In this method
 a read operation on a replicated file is performed by reading any
copy of the file and a write operation by writing to all copies of the
file
 Some form of locking has to be used to carry out a write operation
 before updating any copy,
 all copies are locked,
 then they are updated,
 and finally the locks are released to complete the write
 Available-Copies Protocol
◦ The main problem with the read-any-write-all
protocol is
 a write operation cannot be performed if any of the servers
having a copy of the replicated file is down at the time of the
write operation
◦ The available-copies protocol relaxes this restriction
and allows write operations to be carried out
◦ In this method,
 a read operation is performed by reading any available copy,
 but a write operation is performed by writing to all available
copies
◦ The basic idea is
 when a server recovers after a failure,
 It brings itself up to date by copying from other servers
before accepting any user request
 Failed servers (sites) are dynamically detected by high-
priority status management routines
 and configured out of the system while newly recovered
sites are configured back in
 Primary-Copy Protocol
◦ In this protocol,
 for each replicated file, one copy is designated as the primary copy and all
the others are secondary copies.
 Read operations can be performed using any copy, primary or secondary
 But, all write operations are directly performed only on the primary copy
 Each server having a secondary copy updates its copy either
 by receiving notification of changes from the server having the primary
copy
 or by requesting the updated copy from it
◦ for UNIX-like semantics
 when the primary-copy server receives an update request,
 it immediately orders all the secondary-copy servers to update their
copies
◦ A fuzzier consistency semantics
 if a write operation completes as soon as the primary copy has been
updated
 The secondary copies are then lazily updated either in the background or
 when requested for an updated version by their servers
 Quorum-Based Protocols
◦ The read-any-write-all and available-copies protocols
cannot handle the network partition problem in
which the copies of a replicated file are partitioned
into two more active groups
◦ quorum protocol that is capable of handling the
network partition problem
◦ increase the availability of write operations
◦ A quorum-based protocol works as follows
 there are a total of n copies of a replicated file F
 To read the file, a minimum r copies of F have to be
consulted.
 This set of r copies is called a read quorum
 to perform a write operation on the file, a minimum w copies
of F have to be written
 This set of w copies is called a write quorum
 the values of rand w is that the sum of the read and write
quorums must be greater than the total number of copies n
(r +w> n)
 the quorum protocol does not require that write
operations be executed on all copies of a
replicated file
 therefore, it becomes necessary to be able to
identify a current (up-to-date) copy in a
quorum
 This is achieved by associating a version
number attribute with each copy
 The version number of a copy is updated every
time the copy is modified
 A copy with the largest version number in a
quorum is current
 The new version number assigned to each copy
is one more than the version number associated
with the current copy
 A read is executed as follows:
1. Retrieve a read quorum (any r copies) of F.
2. Of the r copies retrieved, select the copy with
the largest version number.
3. Perform the read operation on the selected
copy.
 A write is executed as follows:
1. Retrieve a write quorum (any w copies) of F.
2. Of the w copies retrieved, get the version
number of the copy with the largest version
number.
3. Increment the version number.
4. Write the new value and the new version
number to all the w copies of the write
quorum.
 several special algorithms can be derived from it a few
are:
1. Read-any-write-all protocol
◦ The read-any-write-all protocol is actually a special case of
the generalized quorum protocol with r=1 and w = n.
◦ This protocol is suitable for use when the ratio of read to
write operations is large
2. Read-all-write-any protocol
◦ For this protocol r=n and w= 1
◦ This protocol may be used in those cases where the ratio of
write to read operations is large
3. Majority-consensus protocol
◦ In this protocol, the sizes of both the read quorum and the
write quorum are made either equal or nearly equal
4. Consensus with weighted voting
◦ a read quorum of r votes is collected to read a file and a write
◦ quorum of w votes to write a file
◦ Since the votes assigned to each copy are not the same,
◦ the size of a read/write quorum depends on the copies
selected for the quorum
FAULT TOLERANCE
 Various types of faults could harm the integrity of the
data stored
 a processor loses the contents of its main memory in the
event of a crash
◦ making the data that are stored by the file system inconsistent
 during a request processing, the server or client machine
may crash
◦ resulting in the loss of state information of the file being
accessed
 distributed file system to tolerate faults are as follows
1. Availability
◦ Availability of a file refers to the fraction of time for which the
file is available for use
◦ availability property depends on the location of the file and
the locations of its clients (users)
◦ Replication is a primary mechanism for improving the
availability of a file
2. Robustness
◦ Robustness of a file refers to its power to survive crashes
of the storage device and decays of the storage medium
on which it is stored
◦ Storage devices that are implemented by using
redundancy techniques
◦ stable storage device, are often used to store robust files
◦ a robust file may not be available until the faulty
component has been recovered
◦ robustness is independent of either the location of the file
or the location of its clients
3. Recoverability
◦ Recoverability of a file refers to its ability to be rolled
back to an earlier, consistent state when an operation on
the file fails or is aborted by the client
◦ Atomic update techniques such as a transaction
mechanism are used to implement recoverable files
 Stable Storage
 In context of crash resistance capability, storage may
be broadly classified into three types:
1. Volatile storage, such as RAM, which cannot
withstand power failures or machine crashes
2. Nonvolatile storage, such as a disk, which can
withstand CPU failures but cannot withstand
transient 1/0 faults and decay of the storage media
3. Stable storage, which can even withstand transient
I/0 faults and decay of the storage media
 The basic idea of stable storage is to use duplicate
storage devices to implement a stable device
 ensure that any period when only one of the two
component devices is operational is significantly less
than the mean time between failures (MTBF) of a
stable device
 a disk-based stable-storage system consists of a pair of ordinary
disks (say disk 1 and disk 2) that are assumed to be decay
independent
 Each block on disk2 is an exact copy of the corresponding block on
disk 1
 effective fault tolerance facilities are provided to ensure that both
the disks are not damaged at the same time
 As with conventional disks,
◦ the two basic operations related to a stable disk are read and write.
◦ A read operation first attempts to read from disk 1.
◦ If it fails, the read is done from disk 2.
◦ A write operation writes to both disks, but the write to disk 2 does not
start until that for disk 1 has been successfully completed
◦ This is to avoid the possibility of both disks getting damaged at the
same time by a hardware fault
 recovery action compares the contents of the two disks block by
block
 Whenever two corresponding blocks differ, the block having
incorrect data is regenerated from the corresponding block on the
other disk
 The correctness of a data block depends on the timing when the
crash occurred
 Effect of Service Paradigm on Fault Tolerance
 A server may be implemented by using anyone of the
following two service paradigms
◦ Stateful
◦ Stateless
 Stateful File Servers
◦ A stateful file server maintains clients' state information
from one access request to the next
◦ This state information is subsequently used when
executing the second request
◦ server to decide how long to retain the state information
of a client, all access requests for a file by a client are
performed within an open and a close operations called a
session
◦ The server creates state information for a client when the
client starts a new session by performing an open
operation
◦ discards the state information when the client closes the
session by performing a close operation
◦ To illustrate how a stateful file server works
 Open (filename, mode):
◦ This operation is used to open a file identified by filename in the
specified mode
◦ When the server executes this operation, it creates an entry for this file
in a file-table
◦ it uses for maintaining the file state information of all the open files
◦ When a file is opened, its read-write pointer is set to zero
◦ the server returns to the client a file identifier (fid) that is used by the
client for subsequent accesses to that file
 Read (fid, n, buffer):
◦ This operation is used to get n bytes of data from the file identified by fid
into the specified buffer
 Write (fid, n, buffer):
◦ On execution of this operation, the server takes n bytes of data from the
specified buffer
 Seek (fid, position):
◦ This operation causes the server to change the value of the read-write
pointer of the file identified by lid to the new value specified as position
 Close (fid):
◦ This statement causes the server to delete from its file-table the file state
information of the file identified by fid
 Stateless File Servers
◦ A stateless file server does not maintain any client
state information
◦ each request identifies the file and the position in the
file for the read/write access
◦ the following operations on files is stateless:
◦ Read (filename, position, n, buffer):
 the server returns to the client n bytes of data of the file
identified by filename.
 The returned data is placed in the specified buffer
 The position within the file from where to begin reading is
specified as the position parameter
◦ Write (filename, position, n, buffer):
 It takes n bytes of data from the specified buffer and writes it
into the file identified by filename
 The position parameter specifies the byte position within the
file from where to start writing
 Advantages of Stateless Service Paradigm in Crash
Recovery
stateless servers stateful servers
crash recovery very easy because no
client state information is maintained
by the server and each request
contains all the information that is
necessary to complete the request
the stateful service paradigm
requires complex crash recovery
procedures. Both client and server
need to reliably detect crashes
When a server crashes while serving
a request, the client need only resend
the request until the server responds,
and the server does no crash recovery
at all
The server needs to detect client
crashes so that it can discard any
state it is holding for the client, and
the client must detect server crashes
When a client crashes during request
processing, no recovery is necessary
for either the client or the server
suffers from the drawbacks of longer
request messages and slower
processing of requests
request processing is slower because
a stateless server does not maintain
any state information to speed up the
processing
 The stateless service paradigm, imposes the
following constraints on the design of the
distributed file system
1. Each request of the stateless service paradigm
identifies the file by its filename instead of a low-
level file identifier
 If the translation of remote names to local names is done
for each request, the request processing overhead will
increase.
 To avoid the translation process, each file should have a
system wide unique low-level name associated with it
2. The retransmission of requests by clients requires
that the operations supported by stateless servers
be idempotent
 Self-contained read and write operations are idempotent
 operations to delete a file should also be made idempotent
if the stateless service paradigm is used
ATOMIC TRANSACTIONS
 An atomic transaction is a computation consisting of
a collection of operations that take place indivisibly
in the presence of failures and concurrent
computations
 Transactions help to preserve the consistency of a set
of shared data objects in the face of failures and
concurrent access
 They make crash recovery much easier,
 because a transaction can only end in two states
◦ Transaction carried out completely or
◦ Transaction failed completely
 Transactions have the following properties:
1. Atomicity
2. Serializability
3. Permanence
1. Atomicity
◦ This property ensures that all the operations of a
transaction appear to have been performed
indivisibly
◦ Two essential requirements for atomicity are
1. Failure atomicity
 ensures that if a transaction's work is interrupted by a
failure, any partially completed results will be undone
 Failure atomicity is also known as the all-or-nothing
property because a transaction is always performed either
completely or not at all
2. Concurrency atomicity
 ensures that while a transaction is in progress, other
processes executing concurrently with the transaction
cannot modify or observe intermediate states of the
transaction
 Concurrency atomicity is also known as consistency property
2. Serializability
◦ This property (also known as isolation property)
ensures that concurrently executing transactions do
not interfere with each other
◦ The concurrent execution of a set of two or more
transactions is serially equivalent
◦ result of performing them concurrently is always the
same as if they had been executed one at a time in
some (system-dependent) order
3. Permanence
◦ This property (also known as durability property)
ensures that once a transaction completes
successfully, the results of its operations become
permanent
◦ And cannot be lost even if the corresponding process
or the processor on which it is running crashes
Need for Transactions In a File Service
 The transactions in a file service is needed for
two main reasons:
1. For improving the recoverability of files in the
event of failures
2. For allowing the concurrent sharing of mutable
files by multiple clients in a consistent manner
Inconsistency Due to System Failure
◦ Consider the banking transaction of Figure, which is
comprised of four operations (a1,a2, a3 ,a4) for
transferring $5 from account X to account Z
◦ account X will have
 a1: read balance (x) of account X
 a2: read balance (z) of account Z
 a3 : write (x - 5)to account X
 a4 : write (z +5) to account Z
Inconsistency Due to Concurrent Access
 Consider the two banking transactions T1 and T2 of Figure
 Transaction T1, which is meant for transferring $5 from
account X to account Z, consists of four operations a1, a2, a3,
and a4
 Transaction T2, which is meant for transferring $7 from
account Y to account Z consists of four operations bI, b2 , b3,
and b4
 Assuming the initial balance in all the accounts is $100
 In a base file service without transaction facility
◦ if the operations corresponding to the two transactions are
allowed to progress concurrently and
◦ if the file system makes no attempt to serialize the execution of
these operations,
◦ unexpected final results may be obtained
 In a file service with transaction facility,
◦ the operations of each of the two transactions can be performed
indivisibly,
◦ producing correct results irrespective of which transaction is
executed first
•Any interleaving of the operations of two or more concurrent
transactions is known as a schedule
•All schedules that produce the same final result as if the
transactions had been performed one at a time in some serial
order are said to be serially equivalent
•Serial equivalence is used as a criteria for the correctness of
concurrently executing transactions
 Operations for Transaction-based File Service
 a transaction consists of a sequence of elementary file access
operations such as read and write
 The three essential operations for transaction service are as follows:
 begin_transaction : returns (TID)
◦ Begins a new transaction and returns a unique transaction
identifier (TID)
◦ This identifier is used in other operations of this transaction
◦ All operations within a begin-transaction and an end-transaction
form the body of the transaction
 end_transaction (TID) : returns (status)
◦ This operation indicates that, from the view point of the client, the
transaction completed successfully
◦ The returned status indicates whether the transaction has
committed or is inactive because it was aborted by either the
client or the server
 abort_transaction (TID)
◦ Aborts the transaction, restores any changes made so far within
the transaction to the original values, and changes its status to
inactive
◦ A transaction is normally aborted in the event of some system
failure
 In a file system with transaction facility
◦ Each file access operation of the transaction service
corresponds to an elementary file service operation
◦ The additional parameters of the file access
operations of the transaction service are the
transaction identifier (TID) of the transaction to
which the operation belongs
◦ the following are file access operations for
transaction service of the stateless server for the
byte-stream files
◦ Tread (TID, filename, position, n, buffer)
 Returns to the client n bytes of the tentative data resulting
from the TID
 if any has been recorded; otherwise it has the same effect as
Read (filename, position, n, buffer)
◦ Twrite (TID,filename, position, n, buffer)
 Has the same effect as Write (filename, position, n, buffer)
 but records the new data in a tentative form that is made
permanent only when the TID commits
Recovery Techniques
 From the point of view of a server, a transaction has
two phases
 The first phases starts
◦ when the server receives a begin_transaction request
from a client
◦ The file access operations in the transaction arc
performed and the client adds changes to file items
progressively
◦ On execution of the endt_ransaction or abort_transaction
operation, the first phase ends and the second phase
starts
 In the second phase,
◦ the transaction is either committed or aborted
◦ In a commit, the changes made by the transaction to file
items are made permanent
◦ in an abort, the changes made by the transaction to file
items are undone to restore the files
 while a transaction is in its first phase, and hence subject to
abortion, its updates must be recorded in a reversible
manner
 The two commonly used approaches for recording file
updates in a reversible manner are
◦ the file versions approach
◦ the write-ahead log approach
File Versions Approach
 when a transaction begins, the current file version is used
for all file access operations (within the transaction) that
do not modify the file
 A transaction commits, the changes made by it to a file
become public
 the current version of a file is the version produced by the
most recently committed transaction
 When the first operation that modifies the file is
encountered within the transaction,
◦ The server creates a tentative version of the file for the
transaction from the current file version
◦ performs the update operation on this version of the file
◦ all subsequent file access operations within the transaction are
performed on this tentative file version
◦ When the transaction is committed,
 the tentative file version is made the new current version and the
previous current version of the file is added to the sequence of old
versions
◦ if the transaction is aborted,
 the tentative file version is discarded and the current file version
 a transaction can modify more than one file
 there is a tentative version of each file for the transaction
 when one of the concurrent transactions commits,
◦ The tentative version corresponding to that transaction
becomes the current version of the file
 if there are no serializability conflicts between this
transaction and the previously committed transactions,
◦ the tentative version corresponding to this
◦ transaction is merged with the current version,
◦ creating a new current version that includes the changes made
by all of the transactions that have. already committed
 if there are serializability conflicts,
◦ all the transactions that are involved except the first one to
commit are aborted
 A serializability conflict occurs when two or more
concurrent transactions are allowed to access the same
data items in a file and one or more of these accesses is a
write operation
 Shadow Blocks Technique for Implementing File
Versions
 The shadow blocks technique is an optimization that
allows the creation of a tentative version of a file
without the need to copy the full file
 A file system uses some form of indexing mechanism to
allocate disk space to files
 the entire disk space is partitioned into fixed-length
byte sequences called blocks
 The file system maintains an index for each file and a
list of free blocks
 The index for a particular file specifies the block
numbers and their exact sequence used for storing the
file data
 the list of free blocks contains the block numbers that
are currently free and may be allocated to any file for
storing new data
 In the shadow blocks technique,
◦ a tentative version of a file is created simply by copying the
index of the current version of that file
◦ a tentative index of the file is created from its current index
◦ when a file update operation affects a block, a new disk
block is taken from the free list, the new tentative value is
written in it
◦ the old block number in the tentative index is replaced by
the block number of the new block
◦ File update operations that append new data to the file are
also handled in the same way
◦ The new blocks allocated to a tentative version of a file are
called shadow blocks
◦ Subsequent writes to the same file block by the transaction
are performed on the same shadow block
◦ if the transaction aborts,
 the shadow blocks of the tentative version of the file are returned to
the list of free blocks and the tentative index is simply discarded
◦ if the transaction commits,
 the tentative index is made the current index of the file and made
permanent
 The Write-Ahead Log Approach
◦ for each operation of a transaction that modifies a file, a
record is first created and written to a log file known as a
write-ahead log
◦ A write-ahead log is maintained on stable storage and
contains a record for each operation that makes changes to
files
◦ Each record contains
 the identifier of the transaction that is making the modification,
 the identifier of the file that is being modified,
 the items of the file that are being modified,
 the old and new values of each item modified
◦ When the transaction commits,
 a commit record is written to the write-ahead log
◦ if the transaction aborts,
 the information in the write-ahead log is used to roll back the
individual file items to their initial values
◦ For rollback, the write-ahead log records are used one by
one, starting from the last record and going backward, to
undo the changes described in them
◦ The write-ahead log also facilitates recovery from crashes
Concurrency Control
 Serializability is an important property of atomic transactions
 It ensures that concurrently executing transactions do not interfere
with each other
 to prevent data inconsistency due to concurrent access by multiple
transactions
 every transaction mechanism needs to implement a concurrency
control algorithm
 concurrency control mechanism allows maximum concurrency with
minimum overhead
 The simplest approach for concurrency control would be to allow the
transactions to be run one at a time
◦ so that two transactions never run concurrently and hence there is
no conflict
◦ this approach does not allow any concurrency
 Simple approach to concurrency control is two transactions should be
allowed to run concurrently only if they do not use a common file
 it is usually not possible to predict which data items will be used by a
transaction
 flexible concurrency control algorithms are normally used by a
transaction mechanism
 The most commonly used are
1. locking
2. concurrency control
3. Timestamps
 Locking
◦ A transaction locks a data item before accessing it
◦ Each lock is labeled with the transaction identifier
◦ the transaction that locked the data item can access
it any number of times
◦ Other transactions that want to access the same data
item must wait until the data item is unlocked
◦ All data items locked by a transaction are unlocked
as soon as the transaction completes (commits or
aborts)
◦ Locking is performed by the transaction service as a
part of the data access operations
◦ clients have no access to operations for locking or
unlocking data items
 Optimized Locking for Better Concurrency
◦ Two optimizations have been proposed for better concurrency
1. Type-specific locking
 A simple lock that is used for all types of accesses to data items
reduces concurrency
 Better concurrency in which more than one type of locks are used
based on the semantics of access operations
 let us consider the two types of access operations read and write
 when a transaction is accessing a data item in the read-only mode,
 there is no reason to keep those transactions waiting that also want to access the
data item in the read-only mode
 Instead of using a single lock for both read and write accesses
 separate locks (read locks and write locks) should be used for the two
operations
 With these two types of locks, the locking rules are given
2. Intention-to-write locks
◦ transaction has two phases
 data item are tentative in the first phase
 Made permanent only in the second phase
◦ when a read lock is set,
 a transaction should be allowed to proceed with its tentative writes until
it is ready to commit
◦ The value of the item will not actually change until the writing
transaction commits
◦ Gifford proposed the use of an "intention-to-write lock" (I-write)
and a commit lock instead of a write lock
◦ if a read lock is set,
 an I-write lock is permitted on the data item and vice versa
◦ If an l-write lock is set,
 no other transaction is allowed to have an I-write lock on the same data
item
◦ A commit lock is set
 Not permitted if any other type of lock is already set on the data item
◦ when a transaction having an I-write lock commits, its I-write
lock is converted to a commit lock
◦ if there are any outstanding read locks, the transaction must wait
until it is possible to set the commit lock
Two-Phase Locking Protocol
 two commonly encountered problems due to early
release of read locks and write locks are as follows
1. Possibility of reading inconsistent data in case of two or
more read accesses by the same transaction
2. Need for cascaded aborts
 Aborting of already committed transactions when a transaction
aborts is known as cascaded aborting
 To avoid the data inconsistency problems, transaction
systems use the two-phase locking protocol
 In the first phase of a transaction, known as the
growing phase,
◦ all locks needed by the transaction are gradually acquired
 in the second phase of the transaction, known as the
shrinking phase,
◦ the acquired locks are released
 once a transaction has released any of its locks, it
cannot request any more locks on the same or other
data items
 Granularity of Locking
 The granularity of locking refers to the unit of lockable
data items
 this unit is normally an entire file, a page, or a record
 if locks can be applied only to whole files, concurrency
gets severely restricted due to the increased possibility
of false sharing
 False sharing occurs when two different transactions
access two unrelated data items that reside in the same
file
 The locking granularity increases concurrency by
reducing the possibility of false sharing
Handling of Locking Deadlocks
 The locking scheme can lead to deadlocks.
 A deadlock is a state
◦ which a transaction waits for a data item locked by
another transaction that in turn waits, perhaps via a
chain of other waiting transactions, for the first
transaction to release some of its locks
 For example, two transactions T1 and T2 have
locked data items D1 and D2 , respectively.
 Now suppose that T1 requests a lock on D2 and
T2 requests a lock on D1
 The commonly used techniques for handling
deadlocks are:
1. Avoidance
2. Detection
3. Timeouts
1. Avoidance
◦ lock data items be always made in a predefined order so that
there can be no cycle in the who-waits-for-whom graph
2. Detection
◦ Deadlocks can be detected by constructing and checking who-
waits-for-whom graph
◦ A cycle in the graph indicates the existence of a deadlock
◦ When such a cycle is detected,
 the server must select and abort a transaction out of the transactions
involved in the cycle
3. Timeouts
◦ A timeout period with each lock
◦ A lock remains invulnerable for a fixed period, after which it
becomes vulnerable
◦ A data item with a vulnerable lock remains locked if no other
transaction is waiting for it to get unlocked
◦ Otherwise, the lock is broken (the data item is unlocked) and
the waiting process is permitted to lock the data item for
accessing it
◦ The transaction whose lock has been broken is normally
aborted
◦ Three major drawbacks of the timeout approach are
 it is hard to decide the length of the timeout period for a lock
 in an overloaded system, the number of transactions getting aborted
due to timeouts will increase
 the method favors short transactions
Optimistic Concurrency Control
 In this approach, transactions are allowed to proceed
uncontrolled up to the end of the first phase
 in the second phase,
◦ before a transaction is committed, the transaction is
validated to see if any of its data items have been changed
by any other transaction since it started
◦ The transaction is committed if found valid; otherwise it is
aborted
 For the validation process, two records are kept of the
data items accessed within a transaction
◦ a read set that contains the data items read by the
transaction
◦ a write set that contains the data items changed, created, or
deleted by the transaction
 To validate a transaction,
◦ its read set and write set are compared with the write sets of all
of the concurrent transactions that reached the end of their first
phase before it
◦ The validation fails if any data item present in the read set or
write set of the transaction being validated is also present in the
write set of any of the concurrent transactions
 Two main advantages of the optimistic concurrency
control approach are
1. It allows maximum parallelism because all transactions are
allowed to proceed independently in parallel without any need
to wait for a lock
2. It is free from deadlock
 It suffers from the following drawbacks:
1. It requires that old versions of files corresponding to recently
committed transactions be retained for the validation process
2. It is free from deadlock, it may cause the starvation of a
transaction
3. increased overhead for rerunning the aborted transactions
 Timestamps
◦ each operation in a transaction is validated when it is
carried out
◦ If the validation fails, the transaction is aborted
immediately and it can then be restarted
◦ To perform validation at the operation level,
 each transaction is assigned a unique timestamp at the
moment it does begin_transaction
 every data item has a read timestamp and a write timestamp
associated with it
◦ When a transaction accesses a data item,
 depending on the type of access (read or write),
 the data item's read timestamp or write timestamp is
updated to the transaction's timestamp
◦ when a transaction is in progress,
 there will be a number of data items with tentative values
and write time stamps
◦ The tentative values and timestamps become
permanent when the transaction commits
 Before performing a read operation or a write
operation on a data item,
◦ the server performs a validation check by inspecting the
timestamps on the data item,
◦ including the timestamps on its tentative values that
belong to incomplete transactions
 The rules for validation are as follows:
1. Validation of a Write Operation
◦ If the timestamp of the current transaction is either equal
to or more recent than the read and (committed) write
timestamps of the accessed data item, the write operation
passes the validation check
◦ if the timestamp of the current transaction is older than
the timestamp of the last read or committed write of the
data item, the validation fails
2. Validation of a Read Operation
◦ If the timestamp of the current transaction is more recent
than the write timestamps of all committed and tentative
values of the accessed data item, the read operation
passes the validation check
◦ the read operation can be performed immediately
only
 if there are no tentative values of the data item;
 otherwise it must wait until the completion of the
transactions having tentative values of the data item
 The validation check fails and the current
transaction is aborted in the following cases:
◦ The timestamp of the current transaction is older
than the timestamp of the most recent (committed)
write to the data item
◦ The timestamp of the current transaction is older
than that of a tentative value of the data item made
by another transaction, although it is more recent
than the timestamp of the permanent data item
Distributed Transaction Service
 A distributed transaction service is an extension of
the conventional transaction service
 It can support transactions involving files managed
by more than one server
 When a transaction involves multiple servers, all the
servers need to communicate with one another
 Servers coordinate their actions during the
processing of the transaction
 A simple approach to coordinate the actions are all
client requests pass through a single server
 To avoid unnecessary communication overhead
◦ a distributed transaction service normally allows client
requests to be sent directly to the server that holds the
relevant file
 In a distributed transaction service,
◦ a client begins a transaction by sending a
begin_transaction request to any server
◦ The contacted server executes the
begin_transaction request and returns the resulting
TID to the client
◦ This server becomes the coordinator for the
transaction and is responsible for aborting or
committing it and for adding other servers called
workers
◦ Workers are dynamically added to the transaction
 For this, a distributed transaction service has a new
operation add_transaction (TID, server_id of coordinator)
 Before an access request is sent to a server
◦ An add_transaction request is sent to the server
◦ When the server receives the add_transaction
request,
 it records the server identifier of the coordinator
 makes a new transaction record containing the TID
 initializes a new log to record the updates to local files from
the transaction
◦ makes a call to the coordinator to inform it of its
intention to join the transaction
 In this manner,
◦ each worker comes to know about the coordinator
◦ the coordinator comes to know about and keeps a
list of all the workers involved in the transaction
 Two-Phase Multiserver Commit Protocol
 Crucial part in the design of a distributed transaction service is the
committing of distributed transactions
 the files changed within the transaction are stored on multiple
servers
 the commit protocol becomes more complicated
 A crash of one server does not normally affect other servers
 The general protocol for committing distributed transactions has
two phases
 When the client of a distributed transaction makes an
end_transaction request
 The coordinator and the workers in the transaction have tentative
values in their logs
 The coordinator is responsible for deciding whether the
transaction should be aborted or committed
 if any server is unable to commit, the whole transaction must be
aborted
 The end_transaction operation is performed in two phases-
◦ preparation phase
◦ commitment phase
 Preparation Phase
1. The coordinator makes an entry in its log
that it is starting the commit protocol.
2. It then sends a prepare message to all the
workers telling them to prepare to
commit.
 The message has a timeout value
associated with it.
3. When a worker gets the message, it checks
to see if it is ready to commit
 If so, it makes an entry in its log and
replies with a ready message
 Otherwise, it replies with an abort message
 Commitment Phase
 The coordinator has received a ready or abort reply from
each worker or the prepare message has timed out
1. If all the workers are ready to commit, the transaction is
committed
 The coordinator makes an entry in its log indicating that the transaction has been
committed
 It then sends a commit message to the workers asking them to commit
 The transaction is effectively completed, so the coordinator can report success to the
client
 Otherwise if any of the replies was abort or the prepare message of any
worker got timed out, the transaction is aborted
 The coordinator makes an entry in its log indicating that the transaction has been
aborted
 It then sends an abort message to the workers asking them to abort and reports
failure to the client
2. When a worker receives the commit message, it makes a
committed entry in its log and sends a committed reply to the
coordinator
3. When the coordinator has received a committed reply from all
the workers,
 The transaction is considered complete, and an its records maintained by
the coordinator are erased
 The coordinator keeps resending the commit message until it receives
the committed reply from all the workers
Nested Transactions
 Nested transactions are a generalization transaction
may be composed of other transactions called
subtransactions
 A subtransaction may in turn have its own
subtransactions
 Tree terminology is normally used in describing
relationships among the transactions
 When a transaction starts, it consists of only one
transaction (process) called the top-level transaction
 This transaction may fork off children, giving rise to
subtransactions
 Each of these children may again fork off its own
children
 When a transaction forks a subtransaction, it is called
the parent of the subtransaction
 The subtransaction is referred to as its child
 A transaction is an ancestor and a descendant of itself
 Committing of Nested Transactions
◦ A transaction may commit only after all its descendants
have committed
◦ A transaction may abort at any time
◦ If entire transaction family to commit, its top-level
transaction must wait for other transactions in the family
to commit
◦ A subtransaction appears atomic to its parent
◦ The changes made to data items by the subtransaction
become visible to its parent only after the subtransaction
commits and notifies this to its parent
◦ if a failure occurs that causes a subtransaction to abort
before its completion
 all of its tentative updates are undone, and its parent is notified
◦ The parent may then choose to continue processing and
try to complete its task using an alternative method or it
may abort itself
◦ if a failure causes an ancestor transaction to abort,
 the updates of all its descendant transactions (that have already
committed) have to be undone
◦ No updates performed within an entire transaction
family are made permanent
 until the top-level transaction commits
◦ Only after the top-level transaction commits is success
reported to the client
 Advantages of Nested Transactions
1. It allows concurrency within a transaction
 a transaction may generate several subtransactions that run in
parallel on different processors
 all children of a parent transaction are synchronized
 the parent transaction exhibits serializability
2. It provides greater protection against failures, in
that it allows checkpoints to be established within a
transaction
 when a subtransaction aborts,
 its parent can still continue and may fork an alternative
subtransaction in place of the failed subtransaction in order to
complete its task
DESIGN PRINCIPLES
1. Clients have cycles to burn
◦ if possible, it is always preferable to perform an operation on a
client's own machine rather than performing it on a server
machine
◦ This principle aims at enhancing the scalability of the design
2. Cache whenever possible
◦ Caching of data at clients' sites frequently improves overall
system performance because it makes data available wherever
it is being currently used
◦ Saving a large amount of computing time and network
bandwidth
◦ Improves performance, scalability, user mobility, and site
autonomy
3. Exploit usage properties
◦ Files should be grouped into a small number of easily
identifiable classes
◦ Class-specific properties should be exploited for independent
optimization for improved performance
4. Minimize systemwide knowledge and change
◦ Aimed at enhancing the scalability of design
◦ Monitoring or automatically updating of global
information should be avoided as far as practicable
◦ The following used for this principle
 The callback approach for cache validation
 The use of negative rights in an access control list (ACL) based
access control mechanism
 Hierarchical system structure
5. Trust the fewest possible entities
◦ Aimed at enhancing the security of the system
◦ To ensure security based on the integrity of the much
smaller number of servers rather than trusting
thousands of clients
6. Batch if possible
◦ Helps in improving performance greatly
◦ Grouping operations together can improve throughput
◦ Transfer of data across the network in large chunks
rather than as individual pages is much more efficient

More Related Content

What's hot

Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systemsSHATHAN
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit INANDINI SHARMA
 
Communication in Distributed Systems
Communication in Distributed SystemsCommunication in Distributed Systems
Communication in Distributed SystemsDilum Bandara
 
OSI Network model ppt
OSI Network model pptOSI Network model ppt
OSI Network model pptextraganesh
 
Transport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networksTransport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networksRushin Shah
 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed SystemAshish KC
 
Distributed process and scheduling
Distributed process and scheduling Distributed process and scheduling
Distributed process and scheduling SHATHAN
 
Types of Load distributing algorithm in Distributed System
Types of Load distributing algorithm in Distributed SystemTypes of Load distributing algorithm in Distributed System
Types of Load distributing algorithm in Distributed SystemDHIVYADEVAKI
 
Computer Network Notes UNIT II
Computer Network Notes UNIT IIComputer Network Notes UNIT II
Computer Network Notes UNIT IINANDINI SHARMA
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systemsvampugani
 
RPC communication,thread and processes
RPC communication,thread and processesRPC communication,thread and processes
RPC communication,thread and processesshraddha mane
 
Atm( Asynchronous Transfer mode )
Atm( Asynchronous Transfer mode )Atm( Asynchronous Transfer mode )
Atm( Asynchronous Transfer mode )Ali Usman
 

What's hot (20)

Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit I
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Tcp ip
Tcp ipTcp ip
Tcp ip
 
Communication in Distributed Systems
Communication in Distributed SystemsCommunication in Distributed Systems
Communication in Distributed Systems
 
OSI Network model ppt
OSI Network model pptOSI Network model ppt
OSI Network model ppt
 
Transport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networksTransport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networks
 
Distributed Coordination-Based Systems
Distributed Coordination-Based SystemsDistributed Coordination-Based Systems
Distributed Coordination-Based Systems
 
Stream oriented communication
Stream oriented communicationStream oriented communication
Stream oriented communication
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Message passing in Distributed Computing Systems
Message passing in Distributed Computing SystemsMessage passing in Distributed Computing Systems
Message passing in Distributed Computing Systems
 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed System
 
Distributed process and scheduling
Distributed process and scheduling Distributed process and scheduling
Distributed process and scheduling
 
Types of Load distributing algorithm in Distributed System
Types of Load distributing algorithm in Distributed SystemTypes of Load distributing algorithm in Distributed System
Types of Load distributing algorithm in Distributed System
 
Computer Network Notes UNIT II
Computer Network Notes UNIT IIComputer Network Notes UNIT II
Computer Network Notes UNIT II
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
RPC communication,thread and processes
RPC communication,thread and processesRPC communication,thread and processes
RPC communication,thread and processes
 
Atm( Asynchronous Transfer mode )
Atm( Asynchronous Transfer mode )Atm( Asynchronous Transfer mode )
Atm( Asynchronous Transfer mode )
 
Distributed shared memory ch 5
Distributed shared memory ch 5Distributed shared memory ch 5
Distributed shared memory ch 5
 
File replication
File replicationFile replication
File replication
 

Similar to Distributed file systems chapter 9

Similar to Distributed file systems chapter 9 (20)

Dos unit 4
Dos unit 4Dos unit 4
Dos unit 4
 
12. dfs
12. dfs12. dfs
12. dfs
 
Chapter-5-DFS.ppt
Chapter-5-DFS.pptChapter-5-DFS.ppt
Chapter-5-DFS.ppt
 
Distributed File System.ppt
Distributed File System.pptDistributed File System.ppt
Distributed File System.ppt
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Distributed file systems dfs
Distributed file systems   dfsDistributed file systems   dfs
Distributed file systems dfs
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Ds
DsDs
Ds
 
Ds
DsDs
Ds
 
UNIT7-FileMgmt.pptx
UNIT7-FileMgmt.pptxUNIT7-FileMgmt.pptx
UNIT7-FileMgmt.pptx
 
Ch16 OS
Ch16 OSCh16 OS
Ch16 OS
 
OS_Ch16
OS_Ch16OS_Ch16
OS_Ch16
 
OSCh16
OSCh16OSCh16
OSCh16
 
Data Analytics: HDFS with Big Data : Issues and Application
Data Analytics:  HDFS  with  Big Data :  Issues and ApplicationData Analytics:  HDFS  with  Big Data :  Issues and Application
Data Analytics: HDFS with Big Data : Issues and Application
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File Systems
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 
File System operating system operating system
File System  operating system operating systemFile System  operating system operating system
File System operating system operating system
 
File system in operating system e learning
File system in operating system e learningFile system in operating system e learning
File system in operating system e learning
 
Ch10 file system interface
Ch10   file system interfaceCh10   file system interface
Ch10 file system interface
 

Recently uploaded

Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 

Recently uploaded (20)

Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 

Distributed file systems chapter 9

  • 1. Distributed File Systems Presented By Dr. A. ASHOK KUMAR Assistant Professor, Department Of Computer Science, Alagappa Government Arts College, Karaikudi – 630003. ashokamjuno@rediffmail.com
  • 2. INTRODUCTION  A file is a named object  The two main purposes of using files are as follows: 1. Permanent storage of information - storing a file on a secondary storage media 2. Sharing of information - a file can be created by one application and then shared with different applications  A file system is a subsystem of an operating system that performs file management activities  organization, storing, retrieval, naming, sharing, and protection of files
  • 3.  A distributed file system provides the users of a distributed system to use files in a distributed environment  The design and implementation is more complex than a conventional file system  A distributed file system supports the following: 1. Remote information sharing  a file to be transparently accessed by processes of any node of the system irrespective of the file‘s location 2. User mobility  a user should not be forced to work on a specific node but should have the flexibility to work on different nodes at different times  This property is desirable due to: node failures, work at different places
  • 4. 3. Availability  For better fault tolerance, files should be available for use even in the event of temporary failure of one and more nodes of the system  distributed file system keeps multiple copies of a file on different nodes of the system  Each copy is called a replica of the file 4. Diskless workstations  A distributed file system, with its transparent remote file- accessing capability, allows the use of diskless workstations in a system  A distributed file system typically provides the following three types of services ◦ Storage service  allocation and management of space on a secondary storage device  the storage service is also known as disk service  allocate disk space in units of fixed-size blocks,  the storage service is also known as block service in these systems
  • 5. ◦ True file service  It is concerned with the operations on individual files  operations for accessing and modifying the data in files and for creating and deleting files  typical design issues of a true file service component  file-accessing mechanism,  file-sharing semantics,  file-caching mechanism,  file replication mechanism,  concurrency control mechanism,  data consistency and multiple copy update protocol,  access control mechanism ◦ Name service  mapping between text names for files and references to files, that is, file lD’s  file systems use directories to perform this mapping  the name service is also known as a directory service  creation and deletion of directories,  adding a new file to a directory,  deleting a file from a directory,  changing the name of a file,  moving a file from one directory to another
  • 6. DESIRABLE FEATURES OF A GOOD DISTRIBUTED FILE SYSTEM 1. Transparency ◦ Structure transparency  a distributed file system uses multiple file servers  the multiplicity of file servers should be transparent to the clients of a distributed file system ◦ Access transparency  the file system interface should not distinguish between local and remote files ◦ Naming transparency  The name of a file should give no hint as to where the file is located ◦ Replication transparency  the existence of multiple copies and their locations should be hidden from the clients
  • 7. 2. User mobility ◦ a user should have the flexibility to work on different nodes at different times 3. Performance ◦ The performance of a file system is measured as the average amount of time needed to satisfy client requests ◦ In centralized file systems,  this time includes the time for accessing the secondary storage device on which the file is stored and the CPU processing time ◦ In a distributed file system,  this time also includes network communication overhead when the accessed file is remote 4. Simplicity and ease of use ◦ the user interface to the file system must be simple and the number of commands should be as small as possible ◦ the semantics of a distributed file system should be the same as a conventional centralized time-sharing system
  • 8. 5. Scalability ◦ a good distributed file system should be designed to easily cope with the growth of nodes and users in the system ◦ A scalable design should withstand high service load, accommodate growth of the user community, and enable simple integration of added resources 6. High availability ◦ the file system may show degradation in performance, functionality ◦ Replication of files at multiple servers is the primary mechanism for providing high availability 7. High reliability ◦ the probability of loss of stored data should be minimized ◦ The file system should automatically generate backup copies of critical files
  • 9. 8. Data integrity ◦ A file is often shared by multiple users ◦ the file system must guarantee the integrity of data stored in it ◦ concurrent access requests from multiple users properly synchronized ◦ Atomic transactions are a high-level concurrency control mechanism 9. Security ◦ protect information stored in a file system against unauthorized access 10. Heterogeneity ◦ flexibility to their users to use different computer platforms for different applications ◦ allows a variety of workstations to participate in the sharing of files via the distributed file system ◦ File systems is the ability to accommodate several different storage media
  • 10. FILE MODELS  The two criteria for file modeling are structure and modifiability  Unstructured and Structured Files ◦ a file is an unstructured sequence of data  no substructure known to the file server  the contents of each file of the file system appears to the file server as an uninterpreted sequence of bytes  the interpretation of  the meaning and structure of the data stored in the files are entirely up to the application programs ◦ structured file mode  a file appears to the file server as an ordered sequence of records  Records of different files of the same file system can be of different size  a record is the smallest unit of file data that can be accessed  the file system read or write operations are carried out on a set of records
  • 11.  Structured files are again of two type ◦ nonindexed records  a file record is accessed by specifying its position within the file ◦ indexed records  records have one or more key fields and can be addressed by specifying the values of the key fields  modern operating systems use the unstructured file model  different applications can interpret the contents of a file in different ways  files also normally have attributes  A file's attributes are information describing that file  Each attribute has a name and a value ◦ Attributes of a file: owner, size, access permissions, date of creation, date of last modification , and date of last access  File attributes are normally maintained and used by the directory service
  • 12. Mutable And Immutable Files  Most existing operating systems use the mutable file model ◦ an update performed on a file overwrites on its old contents to produce the new contents  immutable file model ◦ a file cannot be modified once it has been created except to be deleted ◦ The file versioning approach is normally used to implement file updates ◦ a new version of the file is created each time a change is made to the file contents ◦ the immutable file model suffers from two potential problems  increased use of disk space  increased disk allocation activity
  • 13. FILE-ACCESSING MODELS  The file-accessing model of a distributed file system mainly depends on two factors: ◦ the method used for accessing remote files ◦ the unit of data access  Accessing Remote Files  Remote service model ◦ the processing of the client's request is performed at the server's node ◦ the file server interface and the communication protocols must be designed carefully to minimize the overhead of generating messages  Data-caching model ◦ Used to reduce the amount of network traffic by taking advantage of the locality feature found in file accesses ◦ if the data needed to satisfy the client's access request is not present locally ◦ it is copied from the server's node to the client's node and is cached there ◦ The client's request is processed on the client's node itself by using the cached data ◦ A replacement policy LRU is used to keep the cache size bounded ◦ to modifying the locally cached copy of the data the changes be made in the original file at the server node and other caches having the data  The problem of keeping the cached data consistent with the original file content is referred to as the cache consistency problem
  • 14. Unit of Data Transfer  Unit of data transfer refers to the fraction (or its multiples) of a file data that is transferred to and from clients as a result of a single read or write operation  The four commonly used data transfer models: 1. File-level transfer model 2. Block-level transfer model 3. Byte-level transfer model 4. Record-level transfer model  File-level transfer model ◦ when an operation requires file data to be transferred across the network between a client and a server, the whole file is moved ◦ this model has several advantages  transmitting an entire file on request is more efficient than transmitting it page by page – multiple request –avoid overhead  it has better scalability because it requires fewer accesses to file servers - reduced server load and network traffic  disk access routines on the servers can be better optimized  entire file is cached at a client's site, it becomes immune to server and network failures ◦ Drawback is it requires storage space on the client's node for storing all the required files
  • 15.  Block-level transfer model ◦ file data transfers between a client and a server in units of file blocks ◦ A file block is a contiguous portion of a file and is usually fixed in length ◦ block size is equal to virtual memory page size ◦ this model is also called a page-level transfer model ◦ Advantage is it does not require client nodes to have large storage space ◦ used in systems having diskless workstations ◦ when an entire file is to be accessed,  multiple server requests are needed in this model,  resulting in more network traffic ◦ and more network protocol overhead
  • 16.  Byte-level transfer model ◦ file data transfers between a client and a server take place in units of bytes ◦ provides maximum flexibility because  it allows storage and retrieval of an arbitrary sequential subrange of a file,  specified by an offset within a file,  and a length. ◦ Drawback is the difficulty in cache management due to the variable-length data for different access requests  Record-level transfer model ◦ In this model file contents are structured in the form of records ◦ file data transfers between a client and a server take place in units of records
  • 17. FILE-SHARING SEMANTICS  File sharing semantics adopted by a file system  UNIX semantics ◦ an absolute time ordering on all operations ◦ every read operation on a file sees the effects of all previous write operations performed on that file ◦ implemented in file systems for single processor systems ◦ it is easy to serialize all read/write requests ◦ due to network delays, client requests from different nodes may arrive and get processed in different order ◦ distributed file systems normally implement a more relaxed semantics of file sharing
  • 18.
  • 19.  Session semantics ◦ A session is a series of file accesses made between the open and close operations ◦ all changes made to a file during a session are initially made visible only to the client process ◦ once the session is closed, the changes made to the file are made visible to remote processes ◦ session semantics multiple clients are allowed to perform both read and write accesses concurrently on the same file ◦ each client maintains its own image of the file ◦ When a client closes its session, all other remote clients who continue to use the file are actually using a stale copy of the file ◦ session semantics should be used only with those file systems that use the file level transfer model
  • 20.  Immutable shared-files semantics ◦ It is based on the use of the immutable file model ◦ once the creator of a file declares it to be sharable, the file is treated as immutable ◦ it cannot be modified any more ◦ Changes to the file are handled by creating a new updated version of the file  Transaction-like semantics ◦ based on the transaction mechanism ◦ A transaction is a set of operations enclosed in-between a pair of begin_transaction- and end_transaction-like operations ◦ partial modifications made to the shared data by a transaction will not be visible to other concurrently executing transactions until the transaction ends ◦ beginning and end of a transaction are implicit in the open and close file operations ◦ transactions can involve only one file
  • 21. FILE-CACHING SCHEMES  File caching has been implemented in centralized time-sharing systems to improve file I/O performance  file caching is to retain recently accessed file data in main memory  repeated accesses to the same information can be handled without additional disk transfers  file caching reduces disk transfers  file-caching scheme for a distributed file system may also contribute to its ◦ scalability and ◦ reliability  file-caching scheme for a centralized file system has several key decisions ◦ Granularity of cached data (large versus small), ◦ Cache size (large versus small, fixed versus dynamically changing), ◦ And the replacement policy  file-caching scheme for a distributed file system address the following key decisions ◦ Cache location ◦ Modification propagation ◦ Cache validation
  • 22. Cache location  Cache location refers to the place where the cached data is stored  there are three possible cache locations in a distributed file system ◦ Server's main memory ◦ Client's disk ◦ Client's main memory
  • 23. 1. Server's main memory  When no caching scheme is used, before a remote client can access a file, ◦ the file must first be transferred from the server's disk to the server‘s main memory ◦ and then across the network from the server's main memory to the client‘s main memory  the total cost involved is one disk access and one network access  A cache located in the server's main memory eliminates the disk access cost on a cache hit  The decision to locate the cache in the server's main memory may be due to the following reasons ◦ It is easy to implement and is totally transparent to the clients. ◦ It is easy to always keep the original file and cached data consistent since both reside on the same node  the cache in the server's main memory involves a network access for each file access operation
  • 24. 2. Client's disk  A cache located in a client's disk  It eliminates network access cost but requires disk access cost on a cache hit  A cache on a disk has several advantages ◦ Reliability - cached data are lost in a crash  the data is still there during recovery  and there is no need to fetch it again from the server's node ◦ Large storage capacity  compared to a main-memory cache, a disk cache has plenty of storage space  distributed file systems use the file-level data transfer model in which a file is always cached in its entirety  drawback of having cached d.ata on a client's disk  Does not work if the system is to support diskless workstations  server's main-memory cache eliminates disk access but requires network access on a cache hit.  a client's disk cache eliminates network access but requires disk access on a cache hit
  • 25. 3. Client's main memory  A cache located in a client's main memory  eliminates both network access cost and disk access cost  It permits workstations to be diskless  A client's main-memory cache is not preferable to a client's disk cache  when large cache size and increased reliability of cached data are desired
  • 26. Modification Propagation  Caches of all these nodes contain exactly the same copies of the file data, the caches are consistent  caches to become inconsistent when the file data is changed by one of the clients and other nodes are not changed or discarded  file data cached at multiple client nodes must be consistent  To handle this issue have been proposed and implemented  the following cache design issues for distributed file systems 1. When to propagate modifications made to a cached data to the corresponding file server 2. How to verify the validity of cached data  The modification propagation schemes are ◦ Write-through Scheme ◦ Delayed-Write Scheme
  • 27. Write-through Scheme  when a cache entry is modified, ◦ The new value is immediately sent to the server ◦ Server update the master copy of the file  two main advantages ◦ high degree of reliability ◦ and suitability for UNIX-like semantics  the risk of updated data getting lost (when a client crashes) is very low  drawback of this scheme is ◦ its poor write performance ◦ each write access has to wait until the information is written to the master copy of the server
  • 28. Delayed-Write Scheme  the write-through scheme helps on reads  to reduce network traffic for writes use the delayed-write scheme  when a cache entry is modified, ◦ The new value is written only to the cache ◦ and the client just makes a note that the cache entry has been updated ◦ Some time later, all updated cache entries corresponding to a file are gathered together ◦ and sent to the server at a time  delayed-write policies are of different types 1. Write on ejection from cache 2. Periodic write 3. Write on close
  • 29.  Write on ejection from cache ◦ modified data in a cache entry is sent to the server when  the cache replacement policy has decided to eject it from the client‘s cache  Periodic write ◦ The cache is scanned periodically, at regular intervals, and any cached data that have been modified since the last scan are sent to the server  Write on close ◦ The modifications made to a cached data by a client are sent to the server when the corresponding file is closed by the client ◦ The write-on-close policy is a perfect match for the session semantics ◦ the close operation takes a long time because all modified data must be written to the server before the operation completes
  • 30. Cache Validation Schemes  It is necessary to verify if the data cached at a client node is consistent with the master copy  If not, the cached data must be invalidated and the updated version of the data must be fetched again from the server  There are basically two approaches 1. Client-Initiated Approach 2. Server-Initiated Approach 1. Client-Initiated Approach ◦ A client contacts the server and checks whether its locally cached data is consistent with the master copy ◦ The file-sharing semantics depends on the frequency of the validity check
  • 31. ◦ One of the following approaches may be used ◦ Checking before every access.  This approach defeats the main purpose of caching  because the server has to be contacted on every access.  But it is suitable for supporting UNIX-like semantics ◦ Periodic checking.  In this method, a check is initiated every fixed interval of time  The main problem of this method is that it results in fuzzier file-sharing semantics  Because the data on which an access operation is performed is timing dependent ◦ Check on file open  In this method, a client's cache entry is validated only when the client opens the corresponding file for use  This method is suitable for supporting session semantics  implementing session semantics in a distributed file system is  to use the file-level transfer model coupled with the write-on-close modification propagation policy and the check-on-file-open cache validation policy
  • 32. 2. Server-Initiated Approach  In this method, ◦ a client informs the file server when opening a file, indicating whether the file is being opened for reading, writing, or both ◦ The file server keeps a record of which client has which file open and in what mode ◦ the server keeps monitoring the file usage modes being used by different clients ◦ and reacts whenever it detects a potential for inconsistency. ◦ A potential for inconsistency occurs when two or more clients try to open a file in conflicting modes ◦ For example,  if a file is open for reading, other clients may be allowed to open it for reading without any problem,  But opening it for writing cannot be allowed ◦ a new client should not be allowed to open a file in any mode if the file is already open for writing ◦ When a client closes a file,  it sends an intimation to the server along with any modifications made to the file.  On receiving such an intimation, the server updates its record of which client has which file open in what mode
  • 33. ◦ server-initiated approach has the following problems  It violates the traditional client-server model in which servers simply respond to service request activities initiated by clients  It requires that file servers be stateful  A check-on-open, client-initiated cache validation approach must still be used along with the server- initiated approach
  • 34. FILE REPLICATION  A replicated file is a file that has multiple copies  each copy located on a separate file server  Each copy of the set of copies that comprises a replicated file is referred to as a replica of the replicated file  Difference between Replication and Caching Replication Caching A replica is associated with a server cached copy is normally associated with a client replica normally depends on availability and performance requirements cached copy is primarily dependent on the locality in file access patterns replica is more persistent, widely known, secure, available, complete, and accurate cache is not persistent periodic revalidation with respect to a replica can a cached copy be useful cached copy is contingent upon a replica
  • 35. Advantages of Replication 1. Increased availability ◦ the system remains operational and available to the users despite failures ◦ By replicating critical data on servers with independent failure modes, the probability that one copy of the data will be accessible increases ◦ alternate copies of a replicated data can be used when the primary copy is unavailable 2. Increased reliability ◦ Due to the presence of redundant information in the system, recovery from catastrophic failures becomes possible 3. Improved response time ◦ Replication also helps in improving response time ◦ it enables data to be accessed either locally or from a node to which access time is lower than the primary copy access time
  • 36. 4. Reduced network traffic ◦ If a file's replica is available with a file server that resides on a client's node, ◦ the client's access requests can be serviced locally 5. Improved system throughput ◦ Replication also enables several clients' requests for access to the same file to be serviced in parallel by different servers 6. Better scalability ◦ By replicating the file on multiple servers, ◦ the same requests can now be serviced more efficiently by multiple servers due to workload distribution 7. Autonomous operation ◦ distributed system that provides file replication as a service to their clients ◦ all files required by a client for operation during a limited time period may be replicated on the file server
  • 37. Replication Transparency  A replicated file service must function exactly like a non replicated file service  replication of files should be designed to be transparent to the users  multiple copies of a replicated file appear as a single logical file to its users  the read, write, and other file operations should have the same client interface  Two important issues related to replication transparency are ◦ naming of replicas ◦ and replication control
  • 38. Naming of Replicas  the replication transparency requirement calls for the assignment of a single identifier to all replicas of an object  Assignment of a single identifier to all replicas of an object are immutable objects  there is only one logical object with a given identifier  In mutable objects, different copies of a replicated object may not be the same (consistent) at a particular instance of time  if all replicas are consistent, the mapping must provide the locations of all replicas  and a mechanism to identify the relative distances of the replicas from the user's node  For accessing mutable objects accessing the replica is ◦ the responsibility of the naming system to map a user- supplied identifier into the appropriate replica of a mutable object.
  • 39. Replication Control  Replication control includes determining the number and locations of replicas of a replicated file  the replication control is handled entirely automatically, in a user-transparent manner  under certain circumstances, it is desirable to expose these details to users and to provide them with the flexibility to control the replication process  if replication facility is provided to support autonomous operation of workstations,  users should be provided with the flexibility to create a replica of the desired files on their local nodes  Depending on whether replication control is user transparent or not, the replication process is of two types 1. Explicit replication 2. Implicit/lazy replication
  • 40. 1. Explicit replication ◦ users are given the flexibility to control the entire replication process ◦ when a process creates a file, it specifies the server on which the file should be placed ◦ if desired, additional copies of the file can be created on other servers on explicit request by the users ◦ Users also have the flexibility to delete one or more replicas of a replicated file 2. Implicit/lazy replication ◦ the entire replication process is automatically controlled by the system without users' knowledge ◦ when a process creates a file, it does not provide any information about its location ◦ The system automatically selects one server for the placement of the file ◦ the system automatically creates replicas of the file on other servers, based on some replication policy used by the system ◦ The system automatically delete any extra copies when they are no longer needed ◦ Lazy replication is normally performed in the background when the server has some free time
  • 41. Multicopy Update Problem  maintaining consistency among copies  when a replicated file is updated is the major design issue of a file system that supports replication of files  commonly used approaches to handle this issue are ◦ Read-Only Replication ◦ Read-Any-Write-All Protocol ◦ Available-Copies Protocol ◦ Primary-Copy Protocol ◦ Quorum-Based Protocols
  • 42.  Read-Only Replication ◦ allows the replication of only immutable files ◦ immutable files are used only in the read-only mode ◦ because mutable files cannot be replicated, ◦ The multicopy update problem does not arise ◦ Files known to be frequently read and modified only once in a while  Read-Any-Write-All Protocol ◦ replication scheme that can support the replication of mutable files ◦ replication scheme support the replication of mutable files is the read-any-write-all protocol ◦ In this method  a read operation on a replicated file is performed by reading any copy of the file and a write operation by writing to all copies of the file  Some form of locking has to be used to carry out a write operation  before updating any copy,  all copies are locked,  then they are updated,  and finally the locks are released to complete the write
  • 43.  Available-Copies Protocol ◦ The main problem with the read-any-write-all protocol is  a write operation cannot be performed if any of the servers having a copy of the replicated file is down at the time of the write operation ◦ The available-copies protocol relaxes this restriction and allows write operations to be carried out ◦ In this method,  a read operation is performed by reading any available copy,  but a write operation is performed by writing to all available copies ◦ The basic idea is  when a server recovers after a failure,  It brings itself up to date by copying from other servers before accepting any user request  Failed servers (sites) are dynamically detected by high- priority status management routines  and configured out of the system while newly recovered sites are configured back in
  • 44.  Primary-Copy Protocol ◦ In this protocol,  for each replicated file, one copy is designated as the primary copy and all the others are secondary copies.  Read operations can be performed using any copy, primary or secondary  But, all write operations are directly performed only on the primary copy  Each server having a secondary copy updates its copy either  by receiving notification of changes from the server having the primary copy  or by requesting the updated copy from it ◦ for UNIX-like semantics  when the primary-copy server receives an update request,  it immediately orders all the secondary-copy servers to update their copies ◦ A fuzzier consistency semantics  if a write operation completes as soon as the primary copy has been updated  The secondary copies are then lazily updated either in the background or  when requested for an updated version by their servers
  • 45.  Quorum-Based Protocols ◦ The read-any-write-all and available-copies protocols cannot handle the network partition problem in which the copies of a replicated file are partitioned into two more active groups ◦ quorum protocol that is capable of handling the network partition problem ◦ increase the availability of write operations ◦ A quorum-based protocol works as follows  there are a total of n copies of a replicated file F  To read the file, a minimum r copies of F have to be consulted.  This set of r copies is called a read quorum  to perform a write operation on the file, a minimum w copies of F have to be written  This set of w copies is called a write quorum  the values of rand w is that the sum of the read and write quorums must be greater than the total number of copies n (r +w> n)
  • 46.  the quorum protocol does not require that write operations be executed on all copies of a replicated file  therefore, it becomes necessary to be able to identify a current (up-to-date) copy in a quorum  This is achieved by associating a version number attribute with each copy  The version number of a copy is updated every time the copy is modified  A copy with the largest version number in a quorum is current  The new version number assigned to each copy is one more than the version number associated with the current copy
  • 47.  A read is executed as follows: 1. Retrieve a read quorum (any r copies) of F. 2. Of the r copies retrieved, select the copy with the largest version number. 3. Perform the read operation on the selected copy.  A write is executed as follows: 1. Retrieve a write quorum (any w copies) of F. 2. Of the w copies retrieved, get the version number of the copy with the largest version number. 3. Increment the version number. 4. Write the new value and the new version number to all the w copies of the write quorum.
  • 48.
  • 49.  several special algorithms can be derived from it a few are: 1. Read-any-write-all protocol ◦ The read-any-write-all protocol is actually a special case of the generalized quorum protocol with r=1 and w = n. ◦ This protocol is suitable for use when the ratio of read to write operations is large 2. Read-all-write-any protocol ◦ For this protocol r=n and w= 1 ◦ This protocol may be used in those cases where the ratio of write to read operations is large 3. Majority-consensus protocol ◦ In this protocol, the sizes of both the read quorum and the write quorum are made either equal or nearly equal 4. Consensus with weighted voting ◦ a read quorum of r votes is collected to read a file and a write ◦ quorum of w votes to write a file ◦ Since the votes assigned to each copy are not the same, ◦ the size of a read/write quorum depends on the copies selected for the quorum
  • 50. FAULT TOLERANCE  Various types of faults could harm the integrity of the data stored  a processor loses the contents of its main memory in the event of a crash ◦ making the data that are stored by the file system inconsistent  during a request processing, the server or client machine may crash ◦ resulting in the loss of state information of the file being accessed  distributed file system to tolerate faults are as follows 1. Availability ◦ Availability of a file refers to the fraction of time for which the file is available for use ◦ availability property depends on the location of the file and the locations of its clients (users) ◦ Replication is a primary mechanism for improving the availability of a file
  • 51. 2. Robustness ◦ Robustness of a file refers to its power to survive crashes of the storage device and decays of the storage medium on which it is stored ◦ Storage devices that are implemented by using redundancy techniques ◦ stable storage device, are often used to store robust files ◦ a robust file may not be available until the faulty component has been recovered ◦ robustness is independent of either the location of the file or the location of its clients 3. Recoverability ◦ Recoverability of a file refers to its ability to be rolled back to an earlier, consistent state when an operation on the file fails or is aborted by the client ◦ Atomic update techniques such as a transaction mechanism are used to implement recoverable files
  • 52.  Stable Storage  In context of crash resistance capability, storage may be broadly classified into three types: 1. Volatile storage, such as RAM, which cannot withstand power failures or machine crashes 2. Nonvolatile storage, such as a disk, which can withstand CPU failures but cannot withstand transient 1/0 faults and decay of the storage media 3. Stable storage, which can even withstand transient I/0 faults and decay of the storage media  The basic idea of stable storage is to use duplicate storage devices to implement a stable device  ensure that any period when only one of the two component devices is operational is significantly less than the mean time between failures (MTBF) of a stable device
  • 53.  a disk-based stable-storage system consists of a pair of ordinary disks (say disk 1 and disk 2) that are assumed to be decay independent  Each block on disk2 is an exact copy of the corresponding block on disk 1  effective fault tolerance facilities are provided to ensure that both the disks are not damaged at the same time  As with conventional disks, ◦ the two basic operations related to a stable disk are read and write. ◦ A read operation first attempts to read from disk 1. ◦ If it fails, the read is done from disk 2. ◦ A write operation writes to both disks, but the write to disk 2 does not start until that for disk 1 has been successfully completed ◦ This is to avoid the possibility of both disks getting damaged at the same time by a hardware fault  recovery action compares the contents of the two disks block by block  Whenever two corresponding blocks differ, the block having incorrect data is regenerated from the corresponding block on the other disk  The correctness of a data block depends on the timing when the crash occurred
  • 54.  Effect of Service Paradigm on Fault Tolerance  A server may be implemented by using anyone of the following two service paradigms ◦ Stateful ◦ Stateless  Stateful File Servers ◦ A stateful file server maintains clients' state information from one access request to the next ◦ This state information is subsequently used when executing the second request ◦ server to decide how long to retain the state information of a client, all access requests for a file by a client are performed within an open and a close operations called a session ◦ The server creates state information for a client when the client starts a new session by performing an open operation ◦ discards the state information when the client closes the session by performing a close operation ◦ To illustrate how a stateful file server works
  • 55.  Open (filename, mode): ◦ This operation is used to open a file identified by filename in the specified mode ◦ When the server executes this operation, it creates an entry for this file in a file-table ◦ it uses for maintaining the file state information of all the open files ◦ When a file is opened, its read-write pointer is set to zero ◦ the server returns to the client a file identifier (fid) that is used by the client for subsequent accesses to that file  Read (fid, n, buffer): ◦ This operation is used to get n bytes of data from the file identified by fid into the specified buffer  Write (fid, n, buffer): ◦ On execution of this operation, the server takes n bytes of data from the specified buffer  Seek (fid, position): ◦ This operation causes the server to change the value of the read-write pointer of the file identified by lid to the new value specified as position  Close (fid): ◦ This statement causes the server to delete from its file-table the file state information of the file identified by fid
  • 56.
  • 57.  Stateless File Servers ◦ A stateless file server does not maintain any client state information ◦ each request identifies the file and the position in the file for the read/write access ◦ the following operations on files is stateless: ◦ Read (filename, position, n, buffer):  the server returns to the client n bytes of data of the file identified by filename.  The returned data is placed in the specified buffer  The position within the file from where to begin reading is specified as the position parameter ◦ Write (filename, position, n, buffer):  It takes n bytes of data from the specified buffer and writes it into the file identified by filename  The position parameter specifies the byte position within the file from where to start writing
  • 58.
  • 59.  Advantages of Stateless Service Paradigm in Crash Recovery stateless servers stateful servers crash recovery very easy because no client state information is maintained by the server and each request contains all the information that is necessary to complete the request the stateful service paradigm requires complex crash recovery procedures. Both client and server need to reliably detect crashes When a server crashes while serving a request, the client need only resend the request until the server responds, and the server does no crash recovery at all The server needs to detect client crashes so that it can discard any state it is holding for the client, and the client must detect server crashes When a client crashes during request processing, no recovery is necessary for either the client or the server suffers from the drawbacks of longer request messages and slower processing of requests request processing is slower because a stateless server does not maintain any state information to speed up the processing
  • 60.  The stateless service paradigm, imposes the following constraints on the design of the distributed file system 1. Each request of the stateless service paradigm identifies the file by its filename instead of a low- level file identifier  If the translation of remote names to local names is done for each request, the request processing overhead will increase.  To avoid the translation process, each file should have a system wide unique low-level name associated with it 2. The retransmission of requests by clients requires that the operations supported by stateless servers be idempotent  Self-contained read and write operations are idempotent  operations to delete a file should also be made idempotent if the stateless service paradigm is used
  • 61. ATOMIC TRANSACTIONS  An atomic transaction is a computation consisting of a collection of operations that take place indivisibly in the presence of failures and concurrent computations  Transactions help to preserve the consistency of a set of shared data objects in the face of failures and concurrent access  They make crash recovery much easier,  because a transaction can only end in two states ◦ Transaction carried out completely or ◦ Transaction failed completely  Transactions have the following properties: 1. Atomicity 2. Serializability 3. Permanence
  • 62. 1. Atomicity ◦ This property ensures that all the operations of a transaction appear to have been performed indivisibly ◦ Two essential requirements for atomicity are 1. Failure atomicity  ensures that if a transaction's work is interrupted by a failure, any partially completed results will be undone  Failure atomicity is also known as the all-or-nothing property because a transaction is always performed either completely or not at all 2. Concurrency atomicity  ensures that while a transaction is in progress, other processes executing concurrently with the transaction cannot modify or observe intermediate states of the transaction  Concurrency atomicity is also known as consistency property
  • 63. 2. Serializability ◦ This property (also known as isolation property) ensures that concurrently executing transactions do not interfere with each other ◦ The concurrent execution of a set of two or more transactions is serially equivalent ◦ result of performing them concurrently is always the same as if they had been executed one at a time in some (system-dependent) order 3. Permanence ◦ This property (also known as durability property) ensures that once a transaction completes successfully, the results of its operations become permanent ◦ And cannot be lost even if the corresponding process or the processor on which it is running crashes
  • 64. Need for Transactions In a File Service  The transactions in a file service is needed for two main reasons: 1. For improving the recoverability of files in the event of failures 2. For allowing the concurrent sharing of mutable files by multiple clients in a consistent manner Inconsistency Due to System Failure ◦ Consider the banking transaction of Figure, which is comprised of four operations (a1,a2, a3 ,a4) for transferring $5 from account X to account Z ◦ account X will have  a1: read balance (x) of account X  a2: read balance (z) of account Z  a3 : write (x - 5)to account X  a4 : write (z +5) to account Z
  • 65.
  • 66. Inconsistency Due to Concurrent Access  Consider the two banking transactions T1 and T2 of Figure  Transaction T1, which is meant for transferring $5 from account X to account Z, consists of four operations a1, a2, a3, and a4  Transaction T2, which is meant for transferring $7 from account Y to account Z consists of four operations bI, b2 , b3, and b4  Assuming the initial balance in all the accounts is $100  In a base file service without transaction facility ◦ if the operations corresponding to the two transactions are allowed to progress concurrently and ◦ if the file system makes no attempt to serialize the execution of these operations, ◦ unexpected final results may be obtained  In a file service with transaction facility, ◦ the operations of each of the two transactions can be performed indivisibly, ◦ producing correct results irrespective of which transaction is executed first
  • 67.
  • 68.
  • 69. •Any interleaving of the operations of two or more concurrent transactions is known as a schedule •All schedules that produce the same final result as if the transactions had been performed one at a time in some serial order are said to be serially equivalent •Serial equivalence is used as a criteria for the correctness of concurrently executing transactions
  • 70.  Operations for Transaction-based File Service  a transaction consists of a sequence of elementary file access operations such as read and write  The three essential operations for transaction service are as follows:  begin_transaction : returns (TID) ◦ Begins a new transaction and returns a unique transaction identifier (TID) ◦ This identifier is used in other operations of this transaction ◦ All operations within a begin-transaction and an end-transaction form the body of the transaction  end_transaction (TID) : returns (status) ◦ This operation indicates that, from the view point of the client, the transaction completed successfully ◦ The returned status indicates whether the transaction has committed or is inactive because it was aborted by either the client or the server  abort_transaction (TID) ◦ Aborts the transaction, restores any changes made so far within the transaction to the original values, and changes its status to inactive ◦ A transaction is normally aborted in the event of some system failure
  • 71.  In a file system with transaction facility ◦ Each file access operation of the transaction service corresponds to an elementary file service operation ◦ The additional parameters of the file access operations of the transaction service are the transaction identifier (TID) of the transaction to which the operation belongs ◦ the following are file access operations for transaction service of the stateless server for the byte-stream files ◦ Tread (TID, filename, position, n, buffer)  Returns to the client n bytes of the tentative data resulting from the TID  if any has been recorded; otherwise it has the same effect as Read (filename, position, n, buffer) ◦ Twrite (TID,filename, position, n, buffer)  Has the same effect as Write (filename, position, n, buffer)  but records the new data in a tentative form that is made permanent only when the TID commits
  • 72.
  • 73. Recovery Techniques  From the point of view of a server, a transaction has two phases  The first phases starts ◦ when the server receives a begin_transaction request from a client ◦ The file access operations in the transaction arc performed and the client adds changes to file items progressively ◦ On execution of the endt_ransaction or abort_transaction operation, the first phase ends and the second phase starts  In the second phase, ◦ the transaction is either committed or aborted ◦ In a commit, the changes made by the transaction to file items are made permanent ◦ in an abort, the changes made by the transaction to file items are undone to restore the files
  • 74.  while a transaction is in its first phase, and hence subject to abortion, its updates must be recorded in a reversible manner  The two commonly used approaches for recording file updates in a reversible manner are ◦ the file versions approach ◦ the write-ahead log approach
  • 75. File Versions Approach  when a transaction begins, the current file version is used for all file access operations (within the transaction) that do not modify the file  A transaction commits, the changes made by it to a file become public  the current version of a file is the version produced by the most recently committed transaction  When the first operation that modifies the file is encountered within the transaction, ◦ The server creates a tentative version of the file for the transaction from the current file version ◦ performs the update operation on this version of the file ◦ all subsequent file access operations within the transaction are performed on this tentative file version ◦ When the transaction is committed,  the tentative file version is made the new current version and the previous current version of the file is added to the sequence of old versions ◦ if the transaction is aborted,  the tentative file version is discarded and the current file version
  • 76.
  • 77.  a transaction can modify more than one file  there is a tentative version of each file for the transaction  when one of the concurrent transactions commits, ◦ The tentative version corresponding to that transaction becomes the current version of the file  if there are no serializability conflicts between this transaction and the previously committed transactions, ◦ the tentative version corresponding to this ◦ transaction is merged with the current version, ◦ creating a new current version that includes the changes made by all of the transactions that have. already committed  if there are serializability conflicts, ◦ all the transactions that are involved except the first one to commit are aborted  A serializability conflict occurs when two or more concurrent transactions are allowed to access the same data items in a file and one or more of these accesses is a write operation
  • 78.  Shadow Blocks Technique for Implementing File Versions  The shadow blocks technique is an optimization that allows the creation of a tentative version of a file without the need to copy the full file  A file system uses some form of indexing mechanism to allocate disk space to files  the entire disk space is partitioned into fixed-length byte sequences called blocks  The file system maintains an index for each file and a list of free blocks  The index for a particular file specifies the block numbers and their exact sequence used for storing the file data  the list of free blocks contains the block numbers that are currently free and may be allocated to any file for storing new data
  • 79.  In the shadow blocks technique, ◦ a tentative version of a file is created simply by copying the index of the current version of that file ◦ a tentative index of the file is created from its current index ◦ when a file update operation affects a block, a new disk block is taken from the free list, the new tentative value is written in it ◦ the old block number in the tentative index is replaced by the block number of the new block ◦ File update operations that append new data to the file are also handled in the same way ◦ The new blocks allocated to a tentative version of a file are called shadow blocks ◦ Subsequent writes to the same file block by the transaction are performed on the same shadow block ◦ if the transaction aborts,  the shadow blocks of the tentative version of the file are returned to the list of free blocks and the tentative index is simply discarded ◦ if the transaction commits,  the tentative index is made the current index of the file and made permanent
  • 80.
  • 81.  The Write-Ahead Log Approach ◦ for each operation of a transaction that modifies a file, a record is first created and written to a log file known as a write-ahead log ◦ A write-ahead log is maintained on stable storage and contains a record for each operation that makes changes to files ◦ Each record contains  the identifier of the transaction that is making the modification,  the identifier of the file that is being modified,  the items of the file that are being modified,  the old and new values of each item modified ◦ When the transaction commits,  a commit record is written to the write-ahead log ◦ if the transaction aborts,  the information in the write-ahead log is used to roll back the individual file items to their initial values ◦ For rollback, the write-ahead log records are used one by one, starting from the last record and going backward, to undo the changes described in them ◦ The write-ahead log also facilitates recovery from crashes
  • 82.
  • 83. Concurrency Control  Serializability is an important property of atomic transactions  It ensures that concurrently executing transactions do not interfere with each other  to prevent data inconsistency due to concurrent access by multiple transactions  every transaction mechanism needs to implement a concurrency control algorithm  concurrency control mechanism allows maximum concurrency with minimum overhead  The simplest approach for concurrency control would be to allow the transactions to be run one at a time ◦ so that two transactions never run concurrently and hence there is no conflict ◦ this approach does not allow any concurrency  Simple approach to concurrency control is two transactions should be allowed to run concurrently only if they do not use a common file  it is usually not possible to predict which data items will be used by a transaction  flexible concurrency control algorithms are normally used by a transaction mechanism
  • 84.  The most commonly used are 1. locking 2. concurrency control 3. Timestamps  Locking ◦ A transaction locks a data item before accessing it ◦ Each lock is labeled with the transaction identifier ◦ the transaction that locked the data item can access it any number of times ◦ Other transactions that want to access the same data item must wait until the data item is unlocked ◦ All data items locked by a transaction are unlocked as soon as the transaction completes (commits or aborts) ◦ Locking is performed by the transaction service as a part of the data access operations ◦ clients have no access to operations for locking or unlocking data items
  • 85.  Optimized Locking for Better Concurrency ◦ Two optimizations have been proposed for better concurrency 1. Type-specific locking  A simple lock that is used for all types of accesses to data items reduces concurrency  Better concurrency in which more than one type of locks are used based on the semantics of access operations  let us consider the two types of access operations read and write  when a transaction is accessing a data item in the read-only mode,  there is no reason to keep those transactions waiting that also want to access the data item in the read-only mode  Instead of using a single lock for both read and write accesses  separate locks (read locks and write locks) should be used for the two operations  With these two types of locks, the locking rules are given
  • 86. 2. Intention-to-write locks ◦ transaction has two phases  data item are tentative in the first phase  Made permanent only in the second phase ◦ when a read lock is set,  a transaction should be allowed to proceed with its tentative writes until it is ready to commit ◦ The value of the item will not actually change until the writing transaction commits ◦ Gifford proposed the use of an "intention-to-write lock" (I-write) and a commit lock instead of a write lock ◦ if a read lock is set,  an I-write lock is permitted on the data item and vice versa ◦ If an l-write lock is set,  no other transaction is allowed to have an I-write lock on the same data item ◦ A commit lock is set  Not permitted if any other type of lock is already set on the data item ◦ when a transaction having an I-write lock commits, its I-write lock is converted to a commit lock ◦ if there are any outstanding read locks, the transaction must wait until it is possible to set the commit lock
  • 87.
  • 88. Two-Phase Locking Protocol  two commonly encountered problems due to early release of read locks and write locks are as follows 1. Possibility of reading inconsistent data in case of two or more read accesses by the same transaction 2. Need for cascaded aborts  Aborting of already committed transactions when a transaction aborts is known as cascaded aborting  To avoid the data inconsistency problems, transaction systems use the two-phase locking protocol  In the first phase of a transaction, known as the growing phase, ◦ all locks needed by the transaction are gradually acquired  in the second phase of the transaction, known as the shrinking phase, ◦ the acquired locks are released  once a transaction has released any of its locks, it cannot request any more locks on the same or other data items
  • 89.  Granularity of Locking  The granularity of locking refers to the unit of lockable data items  this unit is normally an entire file, a page, or a record  if locks can be applied only to whole files, concurrency gets severely restricted due to the increased possibility of false sharing  False sharing occurs when two different transactions access two unrelated data items that reside in the same file  The locking granularity increases concurrency by reducing the possibility of false sharing
  • 90. Handling of Locking Deadlocks  The locking scheme can lead to deadlocks.  A deadlock is a state ◦ which a transaction waits for a data item locked by another transaction that in turn waits, perhaps via a chain of other waiting transactions, for the first transaction to release some of its locks  For example, two transactions T1 and T2 have locked data items D1 and D2 , respectively.  Now suppose that T1 requests a lock on D2 and T2 requests a lock on D1  The commonly used techniques for handling deadlocks are: 1. Avoidance 2. Detection 3. Timeouts
  • 91. 1. Avoidance ◦ lock data items be always made in a predefined order so that there can be no cycle in the who-waits-for-whom graph 2. Detection ◦ Deadlocks can be detected by constructing and checking who- waits-for-whom graph ◦ A cycle in the graph indicates the existence of a deadlock ◦ When such a cycle is detected,  the server must select and abort a transaction out of the transactions involved in the cycle 3. Timeouts ◦ A timeout period with each lock ◦ A lock remains invulnerable for a fixed period, after which it becomes vulnerable ◦ A data item with a vulnerable lock remains locked if no other transaction is waiting for it to get unlocked ◦ Otherwise, the lock is broken (the data item is unlocked) and the waiting process is permitted to lock the data item for accessing it ◦ The transaction whose lock has been broken is normally aborted
  • 92. ◦ Three major drawbacks of the timeout approach are  it is hard to decide the length of the timeout period for a lock  in an overloaded system, the number of transactions getting aborted due to timeouts will increase  the method favors short transactions Optimistic Concurrency Control  In this approach, transactions are allowed to proceed uncontrolled up to the end of the first phase  in the second phase, ◦ before a transaction is committed, the transaction is validated to see if any of its data items have been changed by any other transaction since it started ◦ The transaction is committed if found valid; otherwise it is aborted  For the validation process, two records are kept of the data items accessed within a transaction ◦ a read set that contains the data items read by the transaction ◦ a write set that contains the data items changed, created, or deleted by the transaction
  • 93.  To validate a transaction, ◦ its read set and write set are compared with the write sets of all of the concurrent transactions that reached the end of their first phase before it ◦ The validation fails if any data item present in the read set or write set of the transaction being validated is also present in the write set of any of the concurrent transactions  Two main advantages of the optimistic concurrency control approach are 1. It allows maximum parallelism because all transactions are allowed to proceed independently in parallel without any need to wait for a lock 2. It is free from deadlock  It suffers from the following drawbacks: 1. It requires that old versions of files corresponding to recently committed transactions be retained for the validation process 2. It is free from deadlock, it may cause the starvation of a transaction 3. increased overhead for rerunning the aborted transactions
  • 94.  Timestamps ◦ each operation in a transaction is validated when it is carried out ◦ If the validation fails, the transaction is aborted immediately and it can then be restarted ◦ To perform validation at the operation level,  each transaction is assigned a unique timestamp at the moment it does begin_transaction  every data item has a read timestamp and a write timestamp associated with it ◦ When a transaction accesses a data item,  depending on the type of access (read or write),  the data item's read timestamp or write timestamp is updated to the transaction's timestamp ◦ when a transaction is in progress,  there will be a number of data items with tentative values and write time stamps ◦ The tentative values and timestamps become permanent when the transaction commits
  • 95.  Before performing a read operation or a write operation on a data item, ◦ the server performs a validation check by inspecting the timestamps on the data item, ◦ including the timestamps on its tentative values that belong to incomplete transactions  The rules for validation are as follows: 1. Validation of a Write Operation ◦ If the timestamp of the current transaction is either equal to or more recent than the read and (committed) write timestamps of the accessed data item, the write operation passes the validation check ◦ if the timestamp of the current transaction is older than the timestamp of the last read or committed write of the data item, the validation fails 2. Validation of a Read Operation ◦ If the timestamp of the current transaction is more recent than the write timestamps of all committed and tentative values of the accessed data item, the read operation passes the validation check
  • 96. ◦ the read operation can be performed immediately only  if there are no tentative values of the data item;  otherwise it must wait until the completion of the transactions having tentative values of the data item  The validation check fails and the current transaction is aborted in the following cases: ◦ The timestamp of the current transaction is older than the timestamp of the most recent (committed) write to the data item ◦ The timestamp of the current transaction is older than that of a tentative value of the data item made by another transaction, although it is more recent than the timestamp of the permanent data item
  • 97. Distributed Transaction Service  A distributed transaction service is an extension of the conventional transaction service  It can support transactions involving files managed by more than one server  When a transaction involves multiple servers, all the servers need to communicate with one another  Servers coordinate their actions during the processing of the transaction  A simple approach to coordinate the actions are all client requests pass through a single server  To avoid unnecessary communication overhead ◦ a distributed transaction service normally allows client requests to be sent directly to the server that holds the relevant file
  • 98.  In a distributed transaction service, ◦ a client begins a transaction by sending a begin_transaction request to any server ◦ The contacted server executes the begin_transaction request and returns the resulting TID to the client ◦ This server becomes the coordinator for the transaction and is responsible for aborting or committing it and for adding other servers called workers ◦ Workers are dynamically added to the transaction  For this, a distributed transaction service has a new operation add_transaction (TID, server_id of coordinator)
  • 99.  Before an access request is sent to a server ◦ An add_transaction request is sent to the server ◦ When the server receives the add_transaction request,  it records the server identifier of the coordinator  makes a new transaction record containing the TID  initializes a new log to record the updates to local files from the transaction ◦ makes a call to the coordinator to inform it of its intention to join the transaction  In this manner, ◦ each worker comes to know about the coordinator ◦ the coordinator comes to know about and keeps a list of all the workers involved in the transaction
  • 100.  Two-Phase Multiserver Commit Protocol  Crucial part in the design of a distributed transaction service is the committing of distributed transactions  the files changed within the transaction are stored on multiple servers  the commit protocol becomes more complicated  A crash of one server does not normally affect other servers  The general protocol for committing distributed transactions has two phases  When the client of a distributed transaction makes an end_transaction request  The coordinator and the workers in the transaction have tentative values in their logs  The coordinator is responsible for deciding whether the transaction should be aborted or committed  if any server is unable to commit, the whole transaction must be aborted  The end_transaction operation is performed in two phases- ◦ preparation phase ◦ commitment phase
  • 101.  Preparation Phase 1. The coordinator makes an entry in its log that it is starting the commit protocol. 2. It then sends a prepare message to all the workers telling them to prepare to commit.  The message has a timeout value associated with it. 3. When a worker gets the message, it checks to see if it is ready to commit  If so, it makes an entry in its log and replies with a ready message  Otherwise, it replies with an abort message
  • 102.  Commitment Phase  The coordinator has received a ready or abort reply from each worker or the prepare message has timed out 1. If all the workers are ready to commit, the transaction is committed  The coordinator makes an entry in its log indicating that the transaction has been committed  It then sends a commit message to the workers asking them to commit  The transaction is effectively completed, so the coordinator can report success to the client  Otherwise if any of the replies was abort or the prepare message of any worker got timed out, the transaction is aborted  The coordinator makes an entry in its log indicating that the transaction has been aborted  It then sends an abort message to the workers asking them to abort and reports failure to the client 2. When a worker receives the commit message, it makes a committed entry in its log and sends a committed reply to the coordinator 3. When the coordinator has received a committed reply from all the workers,  The transaction is considered complete, and an its records maintained by the coordinator are erased  The coordinator keeps resending the commit message until it receives the committed reply from all the workers
  • 103. Nested Transactions  Nested transactions are a generalization transaction may be composed of other transactions called subtransactions  A subtransaction may in turn have its own subtransactions  Tree terminology is normally used in describing relationships among the transactions  When a transaction starts, it consists of only one transaction (process) called the top-level transaction  This transaction may fork off children, giving rise to subtransactions  Each of these children may again fork off its own children  When a transaction forks a subtransaction, it is called the parent of the subtransaction  The subtransaction is referred to as its child  A transaction is an ancestor and a descendant of itself
  • 104.  Committing of Nested Transactions ◦ A transaction may commit only after all its descendants have committed ◦ A transaction may abort at any time ◦ If entire transaction family to commit, its top-level transaction must wait for other transactions in the family to commit ◦ A subtransaction appears atomic to its parent ◦ The changes made to data items by the subtransaction become visible to its parent only after the subtransaction commits and notifies this to its parent ◦ if a failure occurs that causes a subtransaction to abort before its completion  all of its tentative updates are undone, and its parent is notified ◦ The parent may then choose to continue processing and try to complete its task using an alternative method or it may abort itself ◦ if a failure causes an ancestor transaction to abort,  the updates of all its descendant transactions (that have already committed) have to be undone
  • 105. ◦ No updates performed within an entire transaction family are made permanent  until the top-level transaction commits ◦ Only after the top-level transaction commits is success reported to the client  Advantages of Nested Transactions 1. It allows concurrency within a transaction  a transaction may generate several subtransactions that run in parallel on different processors  all children of a parent transaction are synchronized  the parent transaction exhibits serializability 2. It provides greater protection against failures, in that it allows checkpoints to be established within a transaction  when a subtransaction aborts,  its parent can still continue and may fork an alternative subtransaction in place of the failed subtransaction in order to complete its task
  • 106. DESIGN PRINCIPLES 1. Clients have cycles to burn ◦ if possible, it is always preferable to perform an operation on a client's own machine rather than performing it on a server machine ◦ This principle aims at enhancing the scalability of the design 2. Cache whenever possible ◦ Caching of data at clients' sites frequently improves overall system performance because it makes data available wherever it is being currently used ◦ Saving a large amount of computing time and network bandwidth ◦ Improves performance, scalability, user mobility, and site autonomy 3. Exploit usage properties ◦ Files should be grouped into a small number of easily identifiable classes ◦ Class-specific properties should be exploited for independent optimization for improved performance
  • 107. 4. Minimize systemwide knowledge and change ◦ Aimed at enhancing the scalability of design ◦ Monitoring or automatically updating of global information should be avoided as far as practicable ◦ The following used for this principle  The callback approach for cache validation  The use of negative rights in an access control list (ACL) based access control mechanism  Hierarchical system structure 5. Trust the fewest possible entities ◦ Aimed at enhancing the security of the system ◦ To ensure security based on the integrity of the much smaller number of servers rather than trusting thousands of clients 6. Batch if possible ◦ Helps in improving performance greatly ◦ Grouping operations together can improve throughput ◦ Transfer of data across the network in large chunks rather than as individual pages is much more efficient