1. Distributed File Systems
Presented By
Dr. A. ASHOK KUMAR
Assistant Professor,
Department Of Computer Science,
Alagappa Government Arts College,
Karaikudi – 630003.
ashokamjuno@rediffmail.com
2. INTRODUCTION
A file is a named object
The two main purposes of using files are as
follows:
1. Permanent storage of information - storing a file
on a secondary storage media
2. Sharing of information - a file can be created by
one application and then shared with different
applications
A file system is a subsystem of an operating
system that performs file management
activities
organization, storing, retrieval, naming,
sharing, and protection of files
3. A distributed file system provides the users of
a distributed system to use files in a
distributed environment
The design and implementation is more
complex than a conventional file system
A distributed file system supports the
following:
1. Remote information sharing
a file to be transparently accessed by processes of any
node of the system irrespective of the file‘s location
2. User mobility
a user should not be forced to work on a specific node but
should have the flexibility to work on different nodes at
different times
This property is desirable due to: node failures, work at
different places
4. 3. Availability
For better fault tolerance, files should be available for use even
in the event of temporary failure of one and more nodes of the
system
distributed file system keeps multiple copies of a file on
different nodes of the system
Each copy is called a replica of the file
4. Diskless workstations
A distributed file system, with its transparent remote file-
accessing capability, allows the use of diskless workstations in
a system
A distributed file system typically provides the
following three types of services
◦ Storage service
allocation and management of space on a secondary storage
device
the storage service is also known as disk service
allocate disk space in units of fixed-size blocks,
the storage service is also known as block service in these
systems
5. ◦ True file service
It is concerned with the operations on individual files
operations for accessing and modifying the data in files and for
creating and deleting files
typical design issues of a true file service component
file-accessing mechanism,
file-sharing semantics,
file-caching mechanism,
file replication mechanism,
concurrency control mechanism,
data consistency and multiple copy update protocol,
access control mechanism
◦ Name service
mapping between text names for files and references to files,
that is, file lD’s
file systems use directories to perform this mapping
the name service is also known as a directory service
creation and deletion of directories,
adding a new file to a directory,
deleting a file from a directory,
changing the name of a file,
moving a file from one directory to another
6. DESIRABLE FEATURES OF A GOOD DISTRIBUTED FILE SYSTEM
1. Transparency
◦ Structure transparency
a distributed file system uses multiple file servers
the multiplicity of file servers should be transparent to the
clients of a distributed file system
◦ Access transparency
the file system interface should not distinguish between
local and remote files
◦ Naming transparency
The name of a file should give no hint as to where the file
is located
◦ Replication transparency
the existence of multiple copies and their locations should
be hidden from the clients
7. 2. User mobility
◦ a user should have the flexibility to work on different nodes
at different times
3. Performance
◦ The performance of a file system is measured as the average
amount of time needed to satisfy client requests
◦ In centralized file systems,
this time includes the time for accessing the secondary storage device
on which the file is stored and the CPU processing time
◦ In a distributed file system,
this time also includes network communication overhead when the
accessed file is remote
4. Simplicity and ease of use
◦ the user interface to the file system must be simple and the
number of commands should be as small as possible
◦ the semantics of a distributed file system should be the
same as a conventional centralized time-sharing system
8. 5. Scalability
◦ a good distributed file system should be designed to
easily cope with the growth of nodes and users in the
system
◦ A scalable design should withstand high service load,
accommodate growth of the user community, and
enable simple integration of added resources
6. High availability
◦ the file system may show degradation in
performance, functionality
◦ Replication of files at multiple servers is the primary
mechanism for providing high availability
7. High reliability
◦ the probability of loss of stored data should be
minimized
◦ The file system should automatically generate backup
copies of critical files
9. 8. Data integrity
◦ A file is often shared by multiple users
◦ the file system must guarantee the integrity of data
stored in it
◦ concurrent access requests from multiple users
properly synchronized
◦ Atomic transactions are a high-level concurrency
control mechanism
9. Security
◦ protect information stored in a file system against
unauthorized access
10. Heterogeneity
◦ flexibility to their users to use different computer
platforms for different applications
◦ allows a variety of workstations to participate in the
sharing of files via the distributed file system
◦ File systems is the ability to accommodate several
different storage media
10. FILE MODELS
The two criteria for file modeling are structure
and modifiability
Unstructured and Structured Files
◦ a file is an unstructured sequence of data
no substructure known to the file server
the contents of each file of the file system appears to the file
server as an uninterpreted sequence of bytes
the interpretation of
the meaning and structure of the data stored in the files are
entirely up to the application programs
◦ structured file mode
a file appears to the file server as an ordered sequence of
records
Records of different files of the same file system can be of
different size
a record is the smallest unit of file data that can be accessed
the file system read or write operations are carried out on a
set of records
11. Structured files are again of two type
◦ nonindexed records
a file record is accessed by specifying its position within the file
◦ indexed records
records have one or more key fields and can be addressed by
specifying the values of the key fields
modern operating systems use the unstructured file
model
different applications can interpret the contents of a
file in different ways
files also normally have attributes
A file's attributes are information describing that file
Each attribute has a name and a value
◦ Attributes of a file: owner, size, access permissions, date of
creation, date of last modification , and date of last access
File attributes are normally maintained and used by
the directory service
12. Mutable And Immutable Files
Most existing operating systems use the mutable
file model
◦ an update performed on a file overwrites on its old
contents to produce the new contents
immutable file model
◦ a file cannot be modified once it has been created
except to be deleted
◦ The file versioning approach is normally used to
implement file updates
◦ a new version of the file is created each time a change
is made to the file contents
◦ the immutable file model suffers from two potential
problems
increased use of disk space
increased disk allocation activity
13. FILE-ACCESSING MODELS
The file-accessing model of a distributed file system mainly depends on
two factors:
◦ the method used for accessing remote files
◦ the unit of data access
Accessing Remote Files
Remote service model
◦ the processing of the client's request is performed at the server's node
◦ the file server interface and the communication protocols must be designed
carefully to minimize the overhead of generating messages
Data-caching model
◦ Used to reduce the amount of network traffic by taking advantage of the
locality feature found in file accesses
◦ if the data needed to satisfy the client's access request is not present locally
◦ it is copied from the server's node to the client's node and is cached there
◦ The client's request is processed on the client's node itself by using the
cached data
◦ A replacement policy LRU is used to keep the cache size bounded
◦ to modifying the locally cached copy of the data the changes be made in the
original file at the server node and other caches having the data
The problem of keeping the cached data consistent with the original file
content is referred to as the cache consistency problem
14. Unit of Data Transfer
Unit of data transfer refers to the fraction (or its multiples) of a
file data that is transferred to and from clients as a result of a
single read or write operation
The four commonly used data transfer models:
1. File-level transfer model
2. Block-level transfer model
3. Byte-level transfer model
4. Record-level transfer model
File-level transfer model
◦ when an operation requires file data to be transferred across the
network between a client and a server, the whole file is moved
◦ this model has several advantages
transmitting an entire file on request is more efficient than transmitting it page
by page – multiple request –avoid overhead
it has better scalability because it requires fewer accesses to file servers -
reduced server load and network traffic
disk access routines on the servers can be better optimized
entire file is cached at a client's site, it becomes immune to server and network
failures
◦ Drawback is it requires storage space on the client's node for storing
all the required files
15. Block-level transfer model
◦ file data transfers between a client and a server in
units of file blocks
◦ A file block is a contiguous portion of a file and is
usually fixed in length
◦ block size is equal to virtual memory page size
◦ this model is also called a page-level transfer model
◦ Advantage is it does not require client nodes to
have large storage space
◦ used in systems having diskless workstations
◦ when an entire file is to be accessed,
multiple server requests are needed in this model,
resulting in more network traffic
◦ and more network protocol overhead
16. Byte-level transfer model
◦ file data transfers between a client and a server
take place in units of bytes
◦ provides maximum flexibility because
it allows storage and retrieval of an arbitrary sequential
subrange of a file,
specified by an offset within a file,
and a length.
◦ Drawback is the difficulty in cache management
due to the variable-length data for different access
requests
Record-level transfer model
◦ In this model file contents are structured in the
form of records
◦ file data transfers between a client and a server
take place in units of records
17. FILE-SHARING SEMANTICS
File sharing semantics adopted by a file system
UNIX semantics
◦ an absolute time ordering on all operations
◦ every read operation on a file sees the effects of all
previous write operations performed on that file
◦ implemented in file systems for single processor
systems
◦ it is easy to serialize all read/write requests
◦ due to network delays, client requests from different
nodes may arrive and get processed in different order
◦ distributed file systems normally implement a more
relaxed semantics of file sharing
18.
19. Session semantics
◦ A session is a series of file accesses made between the
open and close operations
◦ all changes made to a file during a session are initially
made visible only to the client process
◦ once the session is closed, the changes made to the file
are made visible to remote processes
◦ session semantics multiple clients are allowed to
perform both read and write accesses concurrently on
the same file
◦ each client maintains its own image of the file
◦ When a client closes its session, all other remote
clients who continue to use the file are actually using a
stale copy of the file
◦ session semantics should be used only with those file
systems that use the file level transfer model
20. Immutable shared-files semantics
◦ It is based on the use of the immutable file model
◦ once the creator of a file declares it to be sharable, the file is
treated as immutable
◦ it cannot be modified any more
◦ Changes to the file are handled by creating a new updated
version of the file
Transaction-like semantics
◦ based on the transaction mechanism
◦ A transaction is a set of operations enclosed in-between a
pair of begin_transaction- and end_transaction-like
operations
◦ partial modifications made to the shared data by a
transaction will not be visible to other concurrently
executing transactions until the transaction ends
◦ beginning and end of a transaction are implicit in the open
and close file operations
◦ transactions can involve only one file
21. FILE-CACHING SCHEMES
File caching has been implemented in centralized time-sharing systems
to improve file I/O performance
file caching is to retain recently accessed file data in main memory
repeated accesses to the same information can be handled without
additional disk transfers
file caching reduces disk transfers
file-caching scheme for a distributed file system may also contribute to
its
◦ scalability and
◦ reliability
file-caching scheme for a centralized file system has several key
decisions
◦ Granularity of cached data (large versus small),
◦ Cache size (large versus small, fixed versus dynamically changing),
◦ And the replacement policy
file-caching scheme for a distributed file system address the following
key decisions
◦ Cache location
◦ Modification propagation
◦ Cache validation
22. Cache location
Cache location refers to the place where the cached data is
stored
there are three possible cache locations in a distributed file
system
◦ Server's main memory
◦ Client's disk
◦ Client's main memory
23. 1. Server's main memory
When no caching scheme is used, before a remote
client can access a file,
◦ the file must first be transferred from the server's disk to
the server‘s main memory
◦ and then across the network from the server's main
memory to the client‘s main memory
the total cost involved is one disk access and one
network access
A cache located in the server's main memory
eliminates the disk access cost on a cache hit
The decision to locate the cache in the server's main
memory may be due to the following reasons
◦ It is easy to implement and is totally transparent to the
clients.
◦ It is easy to always keep the original file and cached data
consistent since both reside on the same node
the cache in the server's main memory involves a
network access for each file access operation
24. 2. Client's disk
A cache located in a client's disk
It eliminates network access cost but requires disk access
cost on a cache hit
A cache on a disk has several advantages
◦ Reliability - cached data are lost in a crash
the data is still there during recovery
and there is no need to fetch it again from the server's node
◦ Large storage capacity
compared to a main-memory cache, a disk cache has plenty of storage
space
distributed file systems use the file-level data transfer
model in which a file is always cached in its entirety
drawback of having cached d.ata on a client's disk
Does not work if the system is to support diskless
workstations
server's main-memory cache eliminates disk access but
requires network access on a cache hit.
a client's disk cache eliminates network access but
requires disk access on a cache hit
25. 3. Client's main memory
A cache located in a client's main memory
eliminates both network access cost and disk
access cost
It permits workstations to be diskless
A client's main-memory cache is not
preferable to a client's disk cache
when large cache size and increased
reliability of cached data are desired
26. Modification Propagation
Caches of all these nodes contain exactly the same
copies of the file data, the caches are consistent
caches to become inconsistent when the file data is
changed by one of the clients and other nodes are not
changed or discarded
file data cached at multiple client nodes must be
consistent
To handle this issue have been proposed and
implemented
the following cache design issues for distributed file
systems
1. When to propagate modifications made to a cached
data to the corresponding file server
2. How to verify the validity of cached data
The modification propagation schemes are
◦ Write-through Scheme
◦ Delayed-Write Scheme
27. Write-through Scheme
when a cache entry is modified,
◦ The new value is immediately sent to the server
◦ Server update the master copy of the file
two main advantages
◦ high degree of reliability
◦ and suitability for UNIX-like semantics
the risk of updated data getting lost (when a
client crashes) is very low
drawback of this scheme is
◦ its poor write performance
◦ each write access has to wait until the information
is written to the master copy of the server
28. Delayed-Write Scheme
the write-through scheme helps on reads
to reduce network traffic for writes use the
delayed-write scheme
when a cache entry is modified,
◦ The new value is written only to the cache
◦ and the client just makes a note that the cache entry
has been updated
◦ Some time later, all updated cache entries
corresponding to a file are gathered together
◦ and sent to the server at a time
delayed-write policies are of different types
1. Write on ejection from cache
2. Periodic write
3. Write on close
29. Write on ejection from cache
◦ modified data in a cache entry is sent to the server when
the cache replacement policy has decided to eject it from the client‘s
cache
Periodic write
◦ The cache is scanned periodically, at regular intervals, and
any cached data that have been modified since the last
scan are sent to the server
Write on close
◦ The modifications made to a cached data by a client are
sent to the server when the corresponding file is closed by
the client
◦ The write-on-close policy is a perfect match for the
session semantics
◦ the close operation takes a long time because all modified
data must be written to the server before the operation
completes
30. Cache Validation Schemes
It is necessary to verify if the data cached at a
client node is consistent with the master copy
If not, the cached data must be invalidated and
the updated version of the data must be fetched
again from the server
There are basically two approaches
1. Client-Initiated Approach
2. Server-Initiated Approach
1. Client-Initiated Approach
◦ A client contacts the server and checks whether its
locally cached data is consistent with the master copy
◦ The file-sharing semantics depends on the frequency
of the validity check
31. ◦ One of the following approaches may be used
◦ Checking before every access.
This approach defeats the main purpose of caching
because the server has to be contacted on every access.
But it is suitable for supporting UNIX-like semantics
◦ Periodic checking.
In this method, a check is initiated every fixed interval of
time
The main problem of this method is that it results in fuzzier
file-sharing semantics
Because the data on which an access operation is performed
is timing dependent
◦ Check on file open
In this method, a client's cache entry is validated only when
the client opens the corresponding file for use
This method is suitable for supporting session semantics
implementing session semantics in a distributed file system
is
to use the file-level transfer model coupled with the write-on-close
modification propagation policy and the check-on-file-open cache
validation policy
32. 2. Server-Initiated Approach
In this method,
◦ a client informs the file server when opening a file, indicating
whether the file is being opened for reading, writing, or both
◦ The file server keeps a record of which client has which file
open and in what mode
◦ the server keeps monitoring the file usage modes being used by
different clients
◦ and reacts whenever it detects a potential for inconsistency.
◦ A potential for inconsistency occurs when two or more clients
try to open a file in conflicting modes
◦ For example,
if a file is open for reading, other clients may be allowed to open it for
reading without any problem,
But opening it for writing cannot be allowed
◦ a new client should not be allowed to open a file in any mode if
the file is already open for writing
◦ When a client closes a file,
it sends an intimation to the server along with any modifications made
to the file.
On receiving such an intimation, the server updates its record of which
client has which file open in what mode
33. ◦ server-initiated approach has the following
problems
It violates the traditional client-server model in which
servers simply respond to service request activities
initiated by clients
It requires that file servers be stateful
A check-on-open, client-initiated cache validation
approach must still be used along with the server-
initiated approach
34. FILE REPLICATION
A replicated file is a file that has multiple copies
each copy located on a separate file server
Each copy of the set of copies that comprises a
replicated file is referred to as a replica of the
replicated file
Difference between Replication and Caching
Replication Caching
A replica is associated with a server cached copy is normally associated with
a client
replica normally depends on availability
and performance requirements
cached copy is primarily dependent on
the locality in file access patterns
replica is more persistent, widely
known, secure, available, complete, and
accurate
cache is not persistent
periodic revalidation with respect to a
replica can a cached copy be useful
cached copy is contingent upon a
replica
35. Advantages of Replication
1. Increased availability
◦ the system remains operational and available to the
users despite failures
◦ By replicating critical data on servers with
independent failure modes, the probability that one
copy of the data will be accessible increases
◦ alternate copies of a replicated data can be used
when the primary copy is unavailable
2. Increased reliability
◦ Due to the presence of redundant information in the
system, recovery from catastrophic failures becomes
possible
3. Improved response time
◦ Replication also helps in improving response time
◦ it enables data to be accessed either locally or from a
node to which access time is lower than the primary
copy access time
36. 4. Reduced network traffic
◦ If a file's replica is available with a file server that resides
on a client's node,
◦ the client's access requests can be serviced locally
5. Improved system throughput
◦ Replication also enables several clients' requests for
access to the same file to be serviced in parallel by
different servers
6. Better scalability
◦ By replicating the file on multiple servers,
◦ the same requests can now be serviced more efficiently
by multiple servers due to workload distribution
7. Autonomous operation
◦ distributed system that provides file replication as a
service to their clients
◦ all files required by a client for operation during a limited
time period may be replicated on the file server
37. Replication Transparency
A replicated file service must function
exactly like a non replicated file service
replication of files should be designed to be
transparent to the users
multiple copies of a replicated file appear as
a single logical file to its users
the read, write, and other file operations
should have the same client interface
Two important issues related to replication
transparency are
◦ naming of replicas
◦ and replication control
38. Naming of Replicas
the replication transparency requirement calls for
the assignment of a single identifier to all replicas of
an object
Assignment of a single identifier to all replicas of an
object are immutable objects
there is only one logical object with a given identifier
In mutable objects, different copies of a replicated
object may not be the same (consistent) at a
particular instance of time
if all replicas are consistent, the mapping must
provide the locations of all replicas
and a mechanism to identify the relative distances of
the replicas from the user's node
For accessing mutable objects accessing the replica
is
◦ the responsibility of the naming system to map a user-
supplied identifier into the appropriate replica of a
mutable object.
39. Replication Control
Replication control includes determining the
number and locations of replicas of a replicated file
the replication control is handled entirely
automatically, in a user-transparent manner
under certain circumstances, it is desirable to expose
these details to users and to provide them with the
flexibility to control the replication process
if replication facility is provided to support
autonomous operation of workstations,
users should be provided with the flexibility to
create a replica of the desired files on their local
nodes
Depending on whether replication control is user
transparent or not, the replication process is of two
types
1. Explicit replication
2. Implicit/lazy replication
40. 1. Explicit replication
◦ users are given the flexibility to control the entire replication process
◦ when a process creates a file, it specifies the server on which the file
should be placed
◦ if desired, additional copies of the file can be created on other servers
on explicit request by the users
◦ Users also have the flexibility to delete one or more replicas of a
replicated file
2. Implicit/lazy replication
◦ the entire replication process is automatically controlled by the
system without users' knowledge
◦ when a process creates a file, it does not provide any information
about its location
◦ The system automatically selects one server for the placement of the
file
◦ the system automatically creates replicas of the file on other servers,
based on some replication policy used by the system
◦ The system automatically delete any extra copies when they are no
longer needed
◦ Lazy replication is normally performed in the background when the
server has some free time
41. Multicopy Update Problem
maintaining consistency among copies
when a replicated file is updated is the major
design issue of a file system that supports
replication of files
commonly used approaches to handle this
issue are
◦ Read-Only Replication
◦ Read-Any-Write-All Protocol
◦ Available-Copies Protocol
◦ Primary-Copy Protocol
◦ Quorum-Based Protocols
42. Read-Only Replication
◦ allows the replication of only immutable files
◦ immutable files are used only in the read-only mode
◦ because mutable files cannot be replicated,
◦ The multicopy update problem does not arise
◦ Files known to be frequently read and modified only once
in a while
Read-Any-Write-All Protocol
◦ replication scheme that can support the replication of
mutable files
◦ replication scheme support the replication of mutable files
is the read-any-write-all protocol
◦ In this method
a read operation on a replicated file is performed by reading any
copy of the file and a write operation by writing to all copies of the
file
Some form of locking has to be used to carry out a write operation
before updating any copy,
all copies are locked,
then they are updated,
and finally the locks are released to complete the write
43. Available-Copies Protocol
◦ The main problem with the read-any-write-all
protocol is
a write operation cannot be performed if any of the servers
having a copy of the replicated file is down at the time of the
write operation
◦ The available-copies protocol relaxes this restriction
and allows write operations to be carried out
◦ In this method,
a read operation is performed by reading any available copy,
but a write operation is performed by writing to all available
copies
◦ The basic idea is
when a server recovers after a failure,
It brings itself up to date by copying from other servers
before accepting any user request
Failed servers (sites) are dynamically detected by high-
priority status management routines
and configured out of the system while newly recovered
sites are configured back in
44. Primary-Copy Protocol
◦ In this protocol,
for each replicated file, one copy is designated as the primary copy and all
the others are secondary copies.
Read operations can be performed using any copy, primary or secondary
But, all write operations are directly performed only on the primary copy
Each server having a secondary copy updates its copy either
by receiving notification of changes from the server having the primary
copy
or by requesting the updated copy from it
◦ for UNIX-like semantics
when the primary-copy server receives an update request,
it immediately orders all the secondary-copy servers to update their
copies
◦ A fuzzier consistency semantics
if a write operation completes as soon as the primary copy has been
updated
The secondary copies are then lazily updated either in the background or
when requested for an updated version by their servers
45. Quorum-Based Protocols
◦ The read-any-write-all and available-copies protocols
cannot handle the network partition problem in
which the copies of a replicated file are partitioned
into two more active groups
◦ quorum protocol that is capable of handling the
network partition problem
◦ increase the availability of write operations
◦ A quorum-based protocol works as follows
there are a total of n copies of a replicated file F
To read the file, a minimum r copies of F have to be
consulted.
This set of r copies is called a read quorum
to perform a write operation on the file, a minimum w copies
of F have to be written
This set of w copies is called a write quorum
the values of rand w is that the sum of the read and write
quorums must be greater than the total number of copies n
(r +w> n)
46. the quorum protocol does not require that write
operations be executed on all copies of a
replicated file
therefore, it becomes necessary to be able to
identify a current (up-to-date) copy in a
quorum
This is achieved by associating a version
number attribute with each copy
The version number of a copy is updated every
time the copy is modified
A copy with the largest version number in a
quorum is current
The new version number assigned to each copy
is one more than the version number associated
with the current copy
47. A read is executed as follows:
1. Retrieve a read quorum (any r copies) of F.
2. Of the r copies retrieved, select the copy with
the largest version number.
3. Perform the read operation on the selected
copy.
A write is executed as follows:
1. Retrieve a write quorum (any w copies) of F.
2. Of the w copies retrieved, get the version
number of the copy with the largest version
number.
3. Increment the version number.
4. Write the new value and the new version
number to all the w copies of the write
quorum.
48.
49. several special algorithms can be derived from it a few
are:
1. Read-any-write-all protocol
◦ The read-any-write-all protocol is actually a special case of
the generalized quorum protocol with r=1 and w = n.
◦ This protocol is suitable for use when the ratio of read to
write operations is large
2. Read-all-write-any protocol
◦ For this protocol r=n and w= 1
◦ This protocol may be used in those cases where the ratio of
write to read operations is large
3. Majority-consensus protocol
◦ In this protocol, the sizes of both the read quorum and the
write quorum are made either equal or nearly equal
4. Consensus with weighted voting
◦ a read quorum of r votes is collected to read a file and a write
◦ quorum of w votes to write a file
◦ Since the votes assigned to each copy are not the same,
◦ the size of a read/write quorum depends on the copies
selected for the quorum
50. FAULT TOLERANCE
Various types of faults could harm the integrity of the
data stored
a processor loses the contents of its main memory in the
event of a crash
◦ making the data that are stored by the file system inconsistent
during a request processing, the server or client machine
may crash
◦ resulting in the loss of state information of the file being
accessed
distributed file system to tolerate faults are as follows
1. Availability
◦ Availability of a file refers to the fraction of time for which the
file is available for use
◦ availability property depends on the location of the file and
the locations of its clients (users)
◦ Replication is a primary mechanism for improving the
availability of a file
51. 2. Robustness
◦ Robustness of a file refers to its power to survive crashes
of the storage device and decays of the storage medium
on which it is stored
◦ Storage devices that are implemented by using
redundancy techniques
◦ stable storage device, are often used to store robust files
◦ a robust file may not be available until the faulty
component has been recovered
◦ robustness is independent of either the location of the file
or the location of its clients
3. Recoverability
◦ Recoverability of a file refers to its ability to be rolled
back to an earlier, consistent state when an operation on
the file fails or is aborted by the client
◦ Atomic update techniques such as a transaction
mechanism are used to implement recoverable files
52. Stable Storage
In context of crash resistance capability, storage may
be broadly classified into three types:
1. Volatile storage, such as RAM, which cannot
withstand power failures or machine crashes
2. Nonvolatile storage, such as a disk, which can
withstand CPU failures but cannot withstand
transient 1/0 faults and decay of the storage media
3. Stable storage, which can even withstand transient
I/0 faults and decay of the storage media
The basic idea of stable storage is to use duplicate
storage devices to implement a stable device
ensure that any period when only one of the two
component devices is operational is significantly less
than the mean time between failures (MTBF) of a
stable device
53. a disk-based stable-storage system consists of a pair of ordinary
disks (say disk 1 and disk 2) that are assumed to be decay
independent
Each block on disk2 is an exact copy of the corresponding block on
disk 1
effective fault tolerance facilities are provided to ensure that both
the disks are not damaged at the same time
As with conventional disks,
◦ the two basic operations related to a stable disk are read and write.
◦ A read operation first attempts to read from disk 1.
◦ If it fails, the read is done from disk 2.
◦ A write operation writes to both disks, but the write to disk 2 does not
start until that for disk 1 has been successfully completed
◦ This is to avoid the possibility of both disks getting damaged at the
same time by a hardware fault
recovery action compares the contents of the two disks block by
block
Whenever two corresponding blocks differ, the block having
incorrect data is regenerated from the corresponding block on the
other disk
The correctness of a data block depends on the timing when the
crash occurred
54. Effect of Service Paradigm on Fault Tolerance
A server may be implemented by using anyone of the
following two service paradigms
◦ Stateful
◦ Stateless
Stateful File Servers
◦ A stateful file server maintains clients' state information
from one access request to the next
◦ This state information is subsequently used when
executing the second request
◦ server to decide how long to retain the state information
of a client, all access requests for a file by a client are
performed within an open and a close operations called a
session
◦ The server creates state information for a client when the
client starts a new session by performing an open
operation
◦ discards the state information when the client closes the
session by performing a close operation
◦ To illustrate how a stateful file server works
55. Open (filename, mode):
◦ This operation is used to open a file identified by filename in the
specified mode
◦ When the server executes this operation, it creates an entry for this file
in a file-table
◦ it uses for maintaining the file state information of all the open files
◦ When a file is opened, its read-write pointer is set to zero
◦ the server returns to the client a file identifier (fid) that is used by the
client for subsequent accesses to that file
Read (fid, n, buffer):
◦ This operation is used to get n bytes of data from the file identified by fid
into the specified buffer
Write (fid, n, buffer):
◦ On execution of this operation, the server takes n bytes of data from the
specified buffer
Seek (fid, position):
◦ This operation causes the server to change the value of the read-write
pointer of the file identified by lid to the new value specified as position
Close (fid):
◦ This statement causes the server to delete from its file-table the file state
information of the file identified by fid
56.
57. Stateless File Servers
◦ A stateless file server does not maintain any client
state information
◦ each request identifies the file and the position in the
file for the read/write access
◦ the following operations on files is stateless:
◦ Read (filename, position, n, buffer):
the server returns to the client n bytes of data of the file
identified by filename.
The returned data is placed in the specified buffer
The position within the file from where to begin reading is
specified as the position parameter
◦ Write (filename, position, n, buffer):
It takes n bytes of data from the specified buffer and writes it
into the file identified by filename
The position parameter specifies the byte position within the
file from where to start writing
58.
59. Advantages of Stateless Service Paradigm in Crash
Recovery
stateless servers stateful servers
crash recovery very easy because no
client state information is maintained
by the server and each request
contains all the information that is
necessary to complete the request
the stateful service paradigm
requires complex crash recovery
procedures. Both client and server
need to reliably detect crashes
When a server crashes while serving
a request, the client need only resend
the request until the server responds,
and the server does no crash recovery
at all
The server needs to detect client
crashes so that it can discard any
state it is holding for the client, and
the client must detect server crashes
When a client crashes during request
processing, no recovery is necessary
for either the client or the server
suffers from the drawbacks of longer
request messages and slower
processing of requests
request processing is slower because
a stateless server does not maintain
any state information to speed up the
processing
60. The stateless service paradigm, imposes the
following constraints on the design of the
distributed file system
1. Each request of the stateless service paradigm
identifies the file by its filename instead of a low-
level file identifier
If the translation of remote names to local names is done
for each request, the request processing overhead will
increase.
To avoid the translation process, each file should have a
system wide unique low-level name associated with it
2. The retransmission of requests by clients requires
that the operations supported by stateless servers
be idempotent
Self-contained read and write operations are idempotent
operations to delete a file should also be made idempotent
if the stateless service paradigm is used
61. ATOMIC TRANSACTIONS
An atomic transaction is a computation consisting of
a collection of operations that take place indivisibly
in the presence of failures and concurrent
computations
Transactions help to preserve the consistency of a set
of shared data objects in the face of failures and
concurrent access
They make crash recovery much easier,
because a transaction can only end in two states
◦ Transaction carried out completely or
◦ Transaction failed completely
Transactions have the following properties:
1. Atomicity
2. Serializability
3. Permanence
62. 1. Atomicity
◦ This property ensures that all the operations of a
transaction appear to have been performed
indivisibly
◦ Two essential requirements for atomicity are
1. Failure atomicity
ensures that if a transaction's work is interrupted by a
failure, any partially completed results will be undone
Failure atomicity is also known as the all-or-nothing
property because a transaction is always performed either
completely or not at all
2. Concurrency atomicity
ensures that while a transaction is in progress, other
processes executing concurrently with the transaction
cannot modify or observe intermediate states of the
transaction
Concurrency atomicity is also known as consistency property
63. 2. Serializability
◦ This property (also known as isolation property)
ensures that concurrently executing transactions do
not interfere with each other
◦ The concurrent execution of a set of two or more
transactions is serially equivalent
◦ result of performing them concurrently is always the
same as if they had been executed one at a time in
some (system-dependent) order
3. Permanence
◦ This property (also known as durability property)
ensures that once a transaction completes
successfully, the results of its operations become
permanent
◦ And cannot be lost even if the corresponding process
or the processor on which it is running crashes
64. Need for Transactions In a File Service
The transactions in a file service is needed for
two main reasons:
1. For improving the recoverability of files in the
event of failures
2. For allowing the concurrent sharing of mutable
files by multiple clients in a consistent manner
Inconsistency Due to System Failure
◦ Consider the banking transaction of Figure, which is
comprised of four operations (a1,a2, a3 ,a4) for
transferring $5 from account X to account Z
◦ account X will have
a1: read balance (x) of account X
a2: read balance (z) of account Z
a3 : write (x - 5)to account X
a4 : write (z +5) to account Z
65.
66. Inconsistency Due to Concurrent Access
Consider the two banking transactions T1 and T2 of Figure
Transaction T1, which is meant for transferring $5 from
account X to account Z, consists of four operations a1, a2, a3,
and a4
Transaction T2, which is meant for transferring $7 from
account Y to account Z consists of four operations bI, b2 , b3,
and b4
Assuming the initial balance in all the accounts is $100
In a base file service without transaction facility
◦ if the operations corresponding to the two transactions are
allowed to progress concurrently and
◦ if the file system makes no attempt to serialize the execution of
these operations,
◦ unexpected final results may be obtained
In a file service with transaction facility,
◦ the operations of each of the two transactions can be performed
indivisibly,
◦ producing correct results irrespective of which transaction is
executed first
67.
68.
69. •Any interleaving of the operations of two or more concurrent
transactions is known as a schedule
•All schedules that produce the same final result as if the
transactions had been performed one at a time in some serial
order are said to be serially equivalent
•Serial equivalence is used as a criteria for the correctness of
concurrently executing transactions
70. Operations for Transaction-based File Service
a transaction consists of a sequence of elementary file access
operations such as read and write
The three essential operations for transaction service are as follows:
begin_transaction : returns (TID)
◦ Begins a new transaction and returns a unique transaction
identifier (TID)
◦ This identifier is used in other operations of this transaction
◦ All operations within a begin-transaction and an end-transaction
form the body of the transaction
end_transaction (TID) : returns (status)
◦ This operation indicates that, from the view point of the client, the
transaction completed successfully
◦ The returned status indicates whether the transaction has
committed or is inactive because it was aborted by either the
client or the server
abort_transaction (TID)
◦ Aborts the transaction, restores any changes made so far within
the transaction to the original values, and changes its status to
inactive
◦ A transaction is normally aborted in the event of some system
failure
71. In a file system with transaction facility
◦ Each file access operation of the transaction service
corresponds to an elementary file service operation
◦ The additional parameters of the file access
operations of the transaction service are the
transaction identifier (TID) of the transaction to
which the operation belongs
◦ the following are file access operations for
transaction service of the stateless server for the
byte-stream files
◦ Tread (TID, filename, position, n, buffer)
Returns to the client n bytes of the tentative data resulting
from the TID
if any has been recorded; otherwise it has the same effect as
Read (filename, position, n, buffer)
◦ Twrite (TID,filename, position, n, buffer)
Has the same effect as Write (filename, position, n, buffer)
but records the new data in a tentative form that is made
permanent only when the TID commits
72.
73. Recovery Techniques
From the point of view of a server, a transaction has
two phases
The first phases starts
◦ when the server receives a begin_transaction request
from a client
◦ The file access operations in the transaction arc
performed and the client adds changes to file items
progressively
◦ On execution of the endt_ransaction or abort_transaction
operation, the first phase ends and the second phase
starts
In the second phase,
◦ the transaction is either committed or aborted
◦ In a commit, the changes made by the transaction to file
items are made permanent
◦ in an abort, the changes made by the transaction to file
items are undone to restore the files
74. while a transaction is in its first phase, and hence subject to
abortion, its updates must be recorded in a reversible
manner
The two commonly used approaches for recording file
updates in a reversible manner are
◦ the file versions approach
◦ the write-ahead log approach
75. File Versions Approach
when a transaction begins, the current file version is used
for all file access operations (within the transaction) that
do not modify the file
A transaction commits, the changes made by it to a file
become public
the current version of a file is the version produced by the
most recently committed transaction
When the first operation that modifies the file is
encountered within the transaction,
◦ The server creates a tentative version of the file for the
transaction from the current file version
◦ performs the update operation on this version of the file
◦ all subsequent file access operations within the transaction are
performed on this tentative file version
◦ When the transaction is committed,
the tentative file version is made the new current version and the
previous current version of the file is added to the sequence of old
versions
◦ if the transaction is aborted,
the tentative file version is discarded and the current file version
76.
77. a transaction can modify more than one file
there is a tentative version of each file for the transaction
when one of the concurrent transactions commits,
◦ The tentative version corresponding to that transaction
becomes the current version of the file
if there are no serializability conflicts between this
transaction and the previously committed transactions,
◦ the tentative version corresponding to this
◦ transaction is merged with the current version,
◦ creating a new current version that includes the changes made
by all of the transactions that have. already committed
if there are serializability conflicts,
◦ all the transactions that are involved except the first one to
commit are aborted
A serializability conflict occurs when two or more
concurrent transactions are allowed to access the same
data items in a file and one or more of these accesses is a
write operation
78. Shadow Blocks Technique for Implementing File
Versions
The shadow blocks technique is an optimization that
allows the creation of a tentative version of a file
without the need to copy the full file
A file system uses some form of indexing mechanism to
allocate disk space to files
the entire disk space is partitioned into fixed-length
byte sequences called blocks
The file system maintains an index for each file and a
list of free blocks
The index for a particular file specifies the block
numbers and their exact sequence used for storing the
file data
the list of free blocks contains the block numbers that
are currently free and may be allocated to any file for
storing new data
79. In the shadow blocks technique,
◦ a tentative version of a file is created simply by copying the
index of the current version of that file
◦ a tentative index of the file is created from its current index
◦ when a file update operation affects a block, a new disk
block is taken from the free list, the new tentative value is
written in it
◦ the old block number in the tentative index is replaced by
the block number of the new block
◦ File update operations that append new data to the file are
also handled in the same way
◦ The new blocks allocated to a tentative version of a file are
called shadow blocks
◦ Subsequent writes to the same file block by the transaction
are performed on the same shadow block
◦ if the transaction aborts,
the shadow blocks of the tentative version of the file are returned to
the list of free blocks and the tentative index is simply discarded
◦ if the transaction commits,
the tentative index is made the current index of the file and made
permanent
80.
81. The Write-Ahead Log Approach
◦ for each operation of a transaction that modifies a file, a
record is first created and written to a log file known as a
write-ahead log
◦ A write-ahead log is maintained on stable storage and
contains a record for each operation that makes changes to
files
◦ Each record contains
the identifier of the transaction that is making the modification,
the identifier of the file that is being modified,
the items of the file that are being modified,
the old and new values of each item modified
◦ When the transaction commits,
a commit record is written to the write-ahead log
◦ if the transaction aborts,
the information in the write-ahead log is used to roll back the
individual file items to their initial values
◦ For rollback, the write-ahead log records are used one by
one, starting from the last record and going backward, to
undo the changes described in them
◦ The write-ahead log also facilitates recovery from crashes
82.
83. Concurrency Control
Serializability is an important property of atomic transactions
It ensures that concurrently executing transactions do not interfere
with each other
to prevent data inconsistency due to concurrent access by multiple
transactions
every transaction mechanism needs to implement a concurrency
control algorithm
concurrency control mechanism allows maximum concurrency with
minimum overhead
The simplest approach for concurrency control would be to allow the
transactions to be run one at a time
◦ so that two transactions never run concurrently and hence there is
no conflict
◦ this approach does not allow any concurrency
Simple approach to concurrency control is two transactions should be
allowed to run concurrently only if they do not use a common file
it is usually not possible to predict which data items will be used by a
transaction
flexible concurrency control algorithms are normally used by a
transaction mechanism
84. The most commonly used are
1. locking
2. concurrency control
3. Timestamps
Locking
◦ A transaction locks a data item before accessing it
◦ Each lock is labeled with the transaction identifier
◦ the transaction that locked the data item can access
it any number of times
◦ Other transactions that want to access the same data
item must wait until the data item is unlocked
◦ All data items locked by a transaction are unlocked
as soon as the transaction completes (commits or
aborts)
◦ Locking is performed by the transaction service as a
part of the data access operations
◦ clients have no access to operations for locking or
unlocking data items
85. Optimized Locking for Better Concurrency
◦ Two optimizations have been proposed for better concurrency
1. Type-specific locking
A simple lock that is used for all types of accesses to data items
reduces concurrency
Better concurrency in which more than one type of locks are used
based on the semantics of access operations
let us consider the two types of access operations read and write
when a transaction is accessing a data item in the read-only mode,
there is no reason to keep those transactions waiting that also want to access the
data item in the read-only mode
Instead of using a single lock for both read and write accesses
separate locks (read locks and write locks) should be used for the two
operations
With these two types of locks, the locking rules are given
86. 2. Intention-to-write locks
◦ transaction has two phases
data item are tentative in the first phase
Made permanent only in the second phase
◦ when a read lock is set,
a transaction should be allowed to proceed with its tentative writes until
it is ready to commit
◦ The value of the item will not actually change until the writing
transaction commits
◦ Gifford proposed the use of an "intention-to-write lock" (I-write)
and a commit lock instead of a write lock
◦ if a read lock is set,
an I-write lock is permitted on the data item and vice versa
◦ If an l-write lock is set,
no other transaction is allowed to have an I-write lock on the same data
item
◦ A commit lock is set
Not permitted if any other type of lock is already set on the data item
◦ when a transaction having an I-write lock commits, its I-write
lock is converted to a commit lock
◦ if there are any outstanding read locks, the transaction must wait
until it is possible to set the commit lock
87.
88. Two-Phase Locking Protocol
two commonly encountered problems due to early
release of read locks and write locks are as follows
1. Possibility of reading inconsistent data in case of two or
more read accesses by the same transaction
2. Need for cascaded aborts
Aborting of already committed transactions when a transaction
aborts is known as cascaded aborting
To avoid the data inconsistency problems, transaction
systems use the two-phase locking protocol
In the first phase of a transaction, known as the
growing phase,
◦ all locks needed by the transaction are gradually acquired
in the second phase of the transaction, known as the
shrinking phase,
◦ the acquired locks are released
once a transaction has released any of its locks, it
cannot request any more locks on the same or other
data items
89. Granularity of Locking
The granularity of locking refers to the unit of lockable
data items
this unit is normally an entire file, a page, or a record
if locks can be applied only to whole files, concurrency
gets severely restricted due to the increased possibility
of false sharing
False sharing occurs when two different transactions
access two unrelated data items that reside in the same
file
The locking granularity increases concurrency by
reducing the possibility of false sharing
90. Handling of Locking Deadlocks
The locking scheme can lead to deadlocks.
A deadlock is a state
◦ which a transaction waits for a data item locked by
another transaction that in turn waits, perhaps via a
chain of other waiting transactions, for the first
transaction to release some of its locks
For example, two transactions T1 and T2 have
locked data items D1 and D2 , respectively.
Now suppose that T1 requests a lock on D2 and
T2 requests a lock on D1
The commonly used techniques for handling
deadlocks are:
1. Avoidance
2. Detection
3. Timeouts
91. 1. Avoidance
◦ lock data items be always made in a predefined order so that
there can be no cycle in the who-waits-for-whom graph
2. Detection
◦ Deadlocks can be detected by constructing and checking who-
waits-for-whom graph
◦ A cycle in the graph indicates the existence of a deadlock
◦ When such a cycle is detected,
the server must select and abort a transaction out of the transactions
involved in the cycle
3. Timeouts
◦ A timeout period with each lock
◦ A lock remains invulnerable for a fixed period, after which it
becomes vulnerable
◦ A data item with a vulnerable lock remains locked if no other
transaction is waiting for it to get unlocked
◦ Otherwise, the lock is broken (the data item is unlocked) and
the waiting process is permitted to lock the data item for
accessing it
◦ The transaction whose lock has been broken is normally
aborted
92. ◦ Three major drawbacks of the timeout approach are
it is hard to decide the length of the timeout period for a lock
in an overloaded system, the number of transactions getting aborted
due to timeouts will increase
the method favors short transactions
Optimistic Concurrency Control
In this approach, transactions are allowed to proceed
uncontrolled up to the end of the first phase
in the second phase,
◦ before a transaction is committed, the transaction is
validated to see if any of its data items have been changed
by any other transaction since it started
◦ The transaction is committed if found valid; otherwise it is
aborted
For the validation process, two records are kept of the
data items accessed within a transaction
◦ a read set that contains the data items read by the
transaction
◦ a write set that contains the data items changed, created, or
deleted by the transaction
93. To validate a transaction,
◦ its read set and write set are compared with the write sets of all
of the concurrent transactions that reached the end of their first
phase before it
◦ The validation fails if any data item present in the read set or
write set of the transaction being validated is also present in the
write set of any of the concurrent transactions
Two main advantages of the optimistic concurrency
control approach are
1. It allows maximum parallelism because all transactions are
allowed to proceed independently in parallel without any need
to wait for a lock
2. It is free from deadlock
It suffers from the following drawbacks:
1. It requires that old versions of files corresponding to recently
committed transactions be retained for the validation process
2. It is free from deadlock, it may cause the starvation of a
transaction
3. increased overhead for rerunning the aborted transactions
94. Timestamps
◦ each operation in a transaction is validated when it is
carried out
◦ If the validation fails, the transaction is aborted
immediately and it can then be restarted
◦ To perform validation at the operation level,
each transaction is assigned a unique timestamp at the
moment it does begin_transaction
every data item has a read timestamp and a write timestamp
associated with it
◦ When a transaction accesses a data item,
depending on the type of access (read or write),
the data item's read timestamp or write timestamp is
updated to the transaction's timestamp
◦ when a transaction is in progress,
there will be a number of data items with tentative values
and write time stamps
◦ The tentative values and timestamps become
permanent when the transaction commits
95. Before performing a read operation or a write
operation on a data item,
◦ the server performs a validation check by inspecting the
timestamps on the data item,
◦ including the timestamps on its tentative values that
belong to incomplete transactions
The rules for validation are as follows:
1. Validation of a Write Operation
◦ If the timestamp of the current transaction is either equal
to or more recent than the read and (committed) write
timestamps of the accessed data item, the write operation
passes the validation check
◦ if the timestamp of the current transaction is older than
the timestamp of the last read or committed write of the
data item, the validation fails
2. Validation of a Read Operation
◦ If the timestamp of the current transaction is more recent
than the write timestamps of all committed and tentative
values of the accessed data item, the read operation
passes the validation check
96. ◦ the read operation can be performed immediately
only
if there are no tentative values of the data item;
otherwise it must wait until the completion of the
transactions having tentative values of the data item
The validation check fails and the current
transaction is aborted in the following cases:
◦ The timestamp of the current transaction is older
than the timestamp of the most recent (committed)
write to the data item
◦ The timestamp of the current transaction is older
than that of a tentative value of the data item made
by another transaction, although it is more recent
than the timestamp of the permanent data item
97. Distributed Transaction Service
A distributed transaction service is an extension of
the conventional transaction service
It can support transactions involving files managed
by more than one server
When a transaction involves multiple servers, all the
servers need to communicate with one another
Servers coordinate their actions during the
processing of the transaction
A simple approach to coordinate the actions are all
client requests pass through a single server
To avoid unnecessary communication overhead
◦ a distributed transaction service normally allows client
requests to be sent directly to the server that holds the
relevant file
98. In a distributed transaction service,
◦ a client begins a transaction by sending a
begin_transaction request to any server
◦ The contacted server executes the
begin_transaction request and returns the resulting
TID to the client
◦ This server becomes the coordinator for the
transaction and is responsible for aborting or
committing it and for adding other servers called
workers
◦ Workers are dynamically added to the transaction
For this, a distributed transaction service has a new
operation add_transaction (TID, server_id of coordinator)
99. Before an access request is sent to a server
◦ An add_transaction request is sent to the server
◦ When the server receives the add_transaction
request,
it records the server identifier of the coordinator
makes a new transaction record containing the TID
initializes a new log to record the updates to local files from
the transaction
◦ makes a call to the coordinator to inform it of its
intention to join the transaction
In this manner,
◦ each worker comes to know about the coordinator
◦ the coordinator comes to know about and keeps a
list of all the workers involved in the transaction
100. Two-Phase Multiserver Commit Protocol
Crucial part in the design of a distributed transaction service is the
committing of distributed transactions
the files changed within the transaction are stored on multiple
servers
the commit protocol becomes more complicated
A crash of one server does not normally affect other servers
The general protocol for committing distributed transactions has
two phases
When the client of a distributed transaction makes an
end_transaction request
The coordinator and the workers in the transaction have tentative
values in their logs
The coordinator is responsible for deciding whether the
transaction should be aborted or committed
if any server is unable to commit, the whole transaction must be
aborted
The end_transaction operation is performed in two phases-
◦ preparation phase
◦ commitment phase
101. Preparation Phase
1. The coordinator makes an entry in its log
that it is starting the commit protocol.
2. It then sends a prepare message to all the
workers telling them to prepare to
commit.
The message has a timeout value
associated with it.
3. When a worker gets the message, it checks
to see if it is ready to commit
If so, it makes an entry in its log and
replies with a ready message
Otherwise, it replies with an abort message
102. Commitment Phase
The coordinator has received a ready or abort reply from
each worker or the prepare message has timed out
1. If all the workers are ready to commit, the transaction is
committed
The coordinator makes an entry in its log indicating that the transaction has been
committed
It then sends a commit message to the workers asking them to commit
The transaction is effectively completed, so the coordinator can report success to the
client
Otherwise if any of the replies was abort or the prepare message of any
worker got timed out, the transaction is aborted
The coordinator makes an entry in its log indicating that the transaction has been
aborted
It then sends an abort message to the workers asking them to abort and reports
failure to the client
2. When a worker receives the commit message, it makes a
committed entry in its log and sends a committed reply to the
coordinator
3. When the coordinator has received a committed reply from all
the workers,
The transaction is considered complete, and an its records maintained by
the coordinator are erased
The coordinator keeps resending the commit message until it receives
the committed reply from all the workers
103. Nested Transactions
Nested transactions are a generalization transaction
may be composed of other transactions called
subtransactions
A subtransaction may in turn have its own
subtransactions
Tree terminology is normally used in describing
relationships among the transactions
When a transaction starts, it consists of only one
transaction (process) called the top-level transaction
This transaction may fork off children, giving rise to
subtransactions
Each of these children may again fork off its own
children
When a transaction forks a subtransaction, it is called
the parent of the subtransaction
The subtransaction is referred to as its child
A transaction is an ancestor and a descendant of itself
104. Committing of Nested Transactions
◦ A transaction may commit only after all its descendants
have committed
◦ A transaction may abort at any time
◦ If entire transaction family to commit, its top-level
transaction must wait for other transactions in the family
to commit
◦ A subtransaction appears atomic to its parent
◦ The changes made to data items by the subtransaction
become visible to its parent only after the subtransaction
commits and notifies this to its parent
◦ if a failure occurs that causes a subtransaction to abort
before its completion
all of its tentative updates are undone, and its parent is notified
◦ The parent may then choose to continue processing and
try to complete its task using an alternative method or it
may abort itself
◦ if a failure causes an ancestor transaction to abort,
the updates of all its descendant transactions (that have already
committed) have to be undone
105. ◦ No updates performed within an entire transaction
family are made permanent
until the top-level transaction commits
◦ Only after the top-level transaction commits is success
reported to the client
Advantages of Nested Transactions
1. It allows concurrency within a transaction
a transaction may generate several subtransactions that run in
parallel on different processors
all children of a parent transaction are synchronized
the parent transaction exhibits serializability
2. It provides greater protection against failures, in
that it allows checkpoints to be established within a
transaction
when a subtransaction aborts,
its parent can still continue and may fork an alternative
subtransaction in place of the failed subtransaction in order to
complete its task
106. DESIGN PRINCIPLES
1. Clients have cycles to burn
◦ if possible, it is always preferable to perform an operation on a
client's own machine rather than performing it on a server
machine
◦ This principle aims at enhancing the scalability of the design
2. Cache whenever possible
◦ Caching of data at clients' sites frequently improves overall
system performance because it makes data available wherever
it is being currently used
◦ Saving a large amount of computing time and network
bandwidth
◦ Improves performance, scalability, user mobility, and site
autonomy
3. Exploit usage properties
◦ Files should be grouped into a small number of easily
identifiable classes
◦ Class-specific properties should be exploited for independent
optimization for improved performance
107. 4. Minimize systemwide knowledge and change
◦ Aimed at enhancing the scalability of design
◦ Monitoring or automatically updating of global
information should be avoided as far as practicable
◦ The following used for this principle
The callback approach for cache validation
The use of negative rights in an access control list (ACL) based
access control mechanism
Hierarchical system structure
5. Trust the fewest possible entities
◦ Aimed at enhancing the security of the system
◦ To ensure security based on the integrity of the much
smaller number of servers rather than trusting
thousands of clients
6. Batch if possible
◦ Helps in improving performance greatly
◦ Grouping operations together can improve throughput
◦ Transfer of data across the network in large chunks
rather than as individual pages is much more efficient