Disconnected Distributed File System

ABSTRACT
     Disconnected operation refers to the ability of a distributed
system client to operate despite server inaccessibility by
emulating services locally. This work aims at extending YFS and
making it highly available by supporting disconnected
operation   at   YFS clients. YFS clients work in two modes. One
is called normal mode where clients are connected to the
network and other is called disconnected mode where the clients
voluntarily or due to network failure are disconnected from the
network. We allow all YFS client's operations in disconnected
mode as long as files are found in cache. An operation fails if the
file being worked on does not exist in client's cache. Thus all
disconnected operations are transparent to a user unless a cache
miss occurs. Similarly, return to normal operation is also
transparent, unless a conflict is detected. We attempt to resolve
conflicts on directories with an exception of same file name
conflict. We don't resolve conflicts on the files, instead we
provide a repair agent, which user uses to resolve conflicts
manually.
DESIGN & IMPLEMENTATION
In the following sections we first describe the design and
implementation of our YFS client, Lock server, Extent server, and
Repair agent and then we conclude.
YFS Client
Writeback cache
We implement a writeback cache on YFS client. If client is
connected then a particular dirty cache line is flushed whenever
client releases the lock on the extent sitting in that cache line. If
client is disconnected and gets connected, sync thread flushes all the
dirty cache lines. Note that after flushing a cache line we don't
erase the line from the cache instead we register a callback update
request with the extent server.
Hoarding
     We implement the concept of hoarding in disconnected YFS. A
user can indicate that a certain set of files is critical for working in
the disconnected mode by adding them to an input file for YFS
client. During startup YFS client launches a thread, which wakes
up every three minutes, reads the files to hoard from the input file
and caches them if they are not present in the cache. This feature
gives users control over what is necessary to be there when they get
disconnected.
Simulation in Disconnected mode
     We want user to continue working in disconnected mode. Thus
on YFS client we modified lock client cache and extent client
cache to simulate the lock server and the extent server locally.
This gives user permission to modify files in disconnected mode
in a hope that they can reintegrate their changes when they get
reconnected. This includes granting of even those locks for
which the client was not the owner when he got disconnected. In
disconnected mode, we also allow user to create new files or
directories provided that the parent directory's content is present
in the cache.
Heartbeater and sync thread
     YFS client assumes that extent server is always up and keeps
pinging the server to maintain its connected or disconnected
state. This thread also takes care of syncing YFS client with lock
server and the extent server. Upon a connection after
disconnection it is the responsibility of YFS client to learn the
current state of lock server and update the extent server with its
changes. As soon as YFS client sees that a reconnection is
established, it starts syncing its state with the lock server and the
extent server. During syncing we don't permit further operations
on YFS client for example if a new thread gets started on YFS
client and tries to read a file we put that thread into wait and
wake it up once the sync process gets over. This is important
because the new thread might get the lock from cache and while
the new thread is working on file, sync thread might mark that
particular lock invalid, which clearly breaks the semantics. As a
first step during sync, we first validate the lock cache and then
we flush extent cache, which might definitely lead to conflicts. If
any of the flush extent causes the conflict (which is detected by
extent server) then that particular extent immediately becomes
inaccessible and marked inconsistent on extent server and erased
from client's cache. Note that during flushing if extent server
accepts the extents then we don't erase the extent from client's
cache rather we register a callback to get the cache line updated
by server. Once syncing is over normal operation takes place.
Update client's cache
     We implement the functionality to enable YFS client receive
update for a cache line from extent server. Whenever YFS client
fetches a missing extent or flushes dirty extent from its cache it
optionally registers a callback request with the extent server.
Currently we register a callback for all fetched/flushed extents.
One can limit such registration only for hoarded files. After
registration, extent server guarantees to send one update request
to the client on modification to the extent. Note that the server
sends an update request and not invalidation request.
Lock Server
     We modified YFS lock server to grant locks to other clients in
Disconnected Distributed YFS (Yet another File System)
Nakul Manchanda, Uma Balasubramanian, and Arvind Kumar
{nm1157, ub263, ak2603} @cs.nyu.edu

case the owner of the lock has got disconnected. Revoker thread on
the lock server assumes that the client got disconnected if a revoke
RPC to that client times out. Our current implementation of lock
server also provides an interface to let the client know if he is the
current owner of the lock. This interface is used by an YFS client
during syncing to learn the current state of locks.
Extent Server
Update Callback Registration
     Our extent server exposes an interface to YFS client to let client
register an update callback. Once registered, server sends an update
notification to the client on first modification to the extent after the
registration. This is one time registration i.e. after sending an update
request server forgets about that client. We implement one time
notification policy to avoid the burden of sending update
notification again and again to the disconnected clients. Also client
might not be interested in getting the notification for a particular
extent after a period of time so it makes sense to have one time
notification.
Conflict Detection
     Conflict detection is done by the extent server. We augmented
the information about the extent with the version number. Whenever
YFS client fetches the extent, it also gets its current version number.
Whenever YFS client modifies and sends the extent to the extent
server, it increments the version number by one. The extent server
compares the version of received extent with that of the existing
content with the same extent id. If the extent server finds the
received version to be less than or equal to the version of existing
content, it first tries to solve the conflict automatically and in case of
failure it marks the content as inconsistent and returns conflict error
code to the client. YFS client deletes the extent from its cache if
extent server returns conflict error code. Consider an example when
two disconnected clients are working on the same file i.e. with the
same version say four. When these clients get connected they both
will send the file with version 5 so whose content comes first will
get accepted and others will conflict with the existing content.
Conflict Resolution
     Upon conflict detection, resolution happens on the extent server.
We try to resolve conflicts only on directories. All conflicts on files
are reported to the user. On extent server we maintain a version
history for each directory. We need version history to know the
content of directory at a version immediately before the conflicting
one. We need this immediate version because if we see a newly
added file it is ambiguous if the file was added by the client who
sent the conflicting version or deleted by the client who sent the
latest existing version. To keep our conflict resolution algorithm
simple, we don't consider complicated scenarios where a
disconnected user and also a connected user deletes and creates the
same file multiple times. Thus on directories we successfully resolve
most of the conflicts except the same file name conflicts. All
conflicting copies, which cannot be resolved without user
intervention are moved to a separate covolume, which is used by
the repair agent.
Repair Agent
Manual Conflict resolution Tool
We implement a repair agent, which is used to resolve
conflicting extent(s). The agent talks to the extent server and
acquires a lock from the extent server itself. Once acquired it
gets all conflicting extents and prompts user to select the desired
version for each extent one by one. The agent releases the lock
after resolving the conflicts. We make agent acquire a lock to
make sure that only one user is resolving the conflicts otherwise
conflict resolution process might lead to further conflicts. Once
a particular conflict is resolved manually the extent immediately
becomes accessible to YFS clients.
CONCLUSION
    We think that the disconnected distributed file system is a
powerful and useful concept. We enjoyed throughout this work
and learned various possible issues and scenarios, which might
arise in a disconnected distributed file system. Distributed
deadlocks are difficult to debug, keeping the cache of connected
clients in sync is highly important. We think that our conflict
resolution algorithm can be further improved to include
complicated delete and create scenarios. We have not tested it for
such complicated scenarios. Also as an optimization to reduce
space complexity of version history of a directory we can
truncate the very old contents of the version history of a
directory but then extent server needs to know the latest version
of different directories on different clients. Thus it was not clear
how much such optimization could pay so we didn't implement
that. In short the project was as interesting as our class and we
had fun in doing it.
REFERENCES
[1]  Mahadev Satyanarayanan, James J. , Puneet Kumar, Maria
       E. Okasaki, Ellen H. Siegal, and David C. Steere. “Coda: A
       Highly Available File System for a Distributed Workstation
       Environment.” Proceedings of the Second IEEE Workshop
       on Workstation Operating Systems Sep. 1989, Pacific Grove,
       CA.
[2]  James J. Kistler and M. Satyanarayanan. “Disconnected
      Operation in the Coda File System”. ACM Transactions on
      Computer Systems Feb. 1992, Vol. 10, No. 1, pp. 325.

Disconnected Distributed File System

Recommended

Recommended

More Related Content

Similar to Disconnected Distributed File System

Similar to Disconnected Distributed File System (20)

Recently uploaded

Recently uploaded (20)

Disconnected Distributed File System