Your SlideShare is downloading. ×
Coda file system
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Coda file system

1,733
views

Published on

Coda (Constant Data Avaialabilty) is a distributed file system developed at Carnegie Mellon University . This presentation explains how it works and different aspects of it.

Coda (Constant Data Avaialabilty) is a distributed file system developed at Carnegie Mellon University . This presentation explains how it works and different aspects of it.

Published in: Technology

2 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total Views
1,733
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
88
Comments
2
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Security
    Secure channels
    Access control on directories
    Client Caching
    Callback promise
    Callback break
    Scalability
    Whole file caching
    Smart/unreliable clients, dumb/reliable servers
    No system wide election
    Disconnected operation
    Optimistic update strategy and basic conflict resolution
    Allow files to be edited from local cache
    Stateful Venus Cache
  • Clients have access to a single shared name space.
    Notice Client A and Client B!
  • Unit of replication: volume
    Volume Storage Group (VSG): set of servers that have a copy of a volume
    Accessible Volume Storage Group (AVSG): set of servers in VSG that the client can contact
    Use vector versioning
    One entry for each server in VSG
    When file updated, corresponding version in AVSG is updated
  • Coda has been designed for high availability, which is mainly reflected by its sophisticated support for client-side caching and its support for server replication. An interesting aspect of Coda that needs further explanation is how a client can continue to operate while being disconnected, even if disconnection lasts for hours or days.
    HOARDING
    normal state of a client
    Client connected to (at least) one server that contains a copy of the volume.
    While in this state, the client can contact the server and issue file requests to perform its work, simultaneously, it will also attempt to keep its cache filled with useful data
    EMULATION
    AVSG=0
    behavior of a server for the volume is emulated on the client’s machine. This means that all file requests will be directly serviced using the locally cached copy of the file.
    Note that while a client is in its EMULATION state, it may still be able to contact servers that manage other volumes. In such cases, disconnection will generally have been caused by a server failure rather than that the client has been disconnected from the network.
    REINTEGRATION
    when reconnection occurs, the client enters the REINTEGRATION state in which it transfers updates to the server in order to make them permanent.
    It is during reintegration that conflicts are detected and, where possible, automati-cally resolved. As shown in the figure, it is possible that during reintegration the connection with the server is lost again, bringing the client back into the EMULATION state
  • Transcript

    • 1. • • • • • • • • • Introduction to Coda File System Naming and Location Architecture Caching and Replication Synchronization Communication Fault Tolerance Security Summary
    • 2. • Coda (constant data availability) is a distributed file system that was developed as a research project at Carnegie Mellon University in 1987 under the direction of Mahadev Satyanarayan. • Coda’s design goals: • Scalability • Constant data availability • Transparency • Security • Consistency
    • 3. • The name space in Coda is hierarchically structured as in UNIX and is partitioned into disjoint volumes. • A volume consists of a set of files and directories located on one server, and is the unit of replication in Coda. • Each file and directory is identified by a 96-bit-long unique file identifier (FID) . Replicas of a file have the same FID.
    • 4. • An FID has 2 components: 1. A 32-bit RVID (Replication Volume Identifier) of the logical volume that the file is part of. 2. A 64-bit file handle, i.e. vnode, that uniquely identifies the file within a volume.
    • 5.  Each file in Coda belongs to exactly one volume   Volume may be replicated across several servers Multiple logical (replicated) volumes map to the same physical volume
    • 6. It works by implementing the following functionalities : 1. Availability of files by replicating a file volume across many servers 2. Disconnected mode of operation by caching files at the client machine
    • 7. Coda File System is divided into two types of nodes: 1. Vice nodes: dedicated file servers 2. Virtue nodes: client machines
    • 8. The internal organization of a Virtue workstation:  is designed to allow access to files even if server is unavailable and  uses Virtual File System to intercept calls from client application
    • 9.  Coda uses RPC2: a sophisticated reliable RPC system  Start a new thread for each request, server periodically informs client it is still working on the request
    • 10. • • Coda servers allow clients to cache whole files Modifications by other clients are notified through invalidation messages which require multicast RPC a) Sending an invalidation message one at a time b) Sending invalidation messages in parallel
    • 11. Session A Client Server Open (RD) Invalidate File f Open (WR) File f Close Close Client Time Session B
    • 12. Session A Session C Client A Open (RD) File f Server Open (WR) Invalidate (Callback Break) Close Close File f OK(no file transfer) File f Close Client B Open (RD) Open (WR) Close Time Session B • Scalability • Fault Tolerance Session D
    • 13. Data structures: • VSG (Volume Storage Group): • Set of servers storing replicas of a volume • AVSG (Accessible Volume Storage Group): • Set of servers accessible to a client for every volume the client has cached
    • 14.     Versioning vector (Coda Version Vector) when partition happens: [1,1,1] Client A updates file  versioning vector in its partition: [2,2,1] Client B updates file  versioning vector in its partition: [1,1,2] Partition repaired  compare versioning vectors: conflict!
    • 15.    HOARDING: File cache in advance with all files that will be accessed when disconnected EMULATION: when disconnected, behavior of server emulated at client REINTEGRATION: transfer updates to server; resolves conflicts
    • 16. • • Hoard database Cache equilibrium: • There is no uncached file with a higher priority than any cached file. • The cache is full, or no uncached file has nonzero priority. • Each cached file is a copy of the one maintained in the client’s AVSG. • Hoard walk
    • 17. Coda’s security architecture consists of two parts: •The first part deals with setting up a secure channel between a client and a server using secure RPC and system-level authentication. •The second part deals with controlling access to files.
    • 18. Vice Server Client (Venus)
    • 19. Operation Description Read Read any file in the directory Write Modify any file in the directory Lookup Look up the status of any file Insert Add a new file to the directory Delete Delete an existing file Administer Modify the ACL of the directory
    • 20. • Peter J. Braam, The Coda File System, www.coda.cs.cmu.edu.