Your SlideShare is downloading. ×
Distributed File Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Distributed File Systems

4,152
views

Published on

Distributed file systems lecture I gave for Andrew Grimshaw's Distributed systems course in the Spring of 2009

Distributed file systems lecture I gave for Andrew Grimshaw's Distributed systems course in the Spring of 2009

Published in: Technology

1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,152
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
265
Comments
1
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • File is a named collection of related information that is recorded on some permanent storage
  • Access transparency – Use same mechanism to access file whether it is local or remote, i.e., map remote files into local file system name space
  • Harder to mark failure Performance issues – caching etc? Can’t really do
  • COTS Legacy apps Does it really matter where it is? One copy versus many copies Etc
  • Lampson –hints for push model
  • A server exports one or more of its directories to remote clients Clients access exported directories by mounting them The contents are then accessed as if they were local
  • Pros: server is stateless, i.e. no state about open files Cons: Locking is difficult, no concurrency control
  • No consistency semantics – things marked dirty flushed within 30 seconds Checks non-dirty items every 5 seconds Bad performance with heavy load etc
  • User process wants to open a file with a pathname P The kernel resolves that it’s a Vice file & passes it to Venus on that workstation One of the LWP’s uses the cache to examine each directory component D of P… When processing a pathname component, Venus identifies the server to be contacted by examining the volume field of the Fid. Authentication, protection checking, and network failures complicate the matter considerably.
  • Transcript

    • 1. 03-10-09 Some slides are taken from Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc
    • 2.
      • 3 rd year graduate student working with Professor Grimshaw
      • Interests lie in Operating Systems, Distributed Systems, and more recently Cloud Computing
      • Also
        • Trumpet
        • Sporty things
        • Hardware Junkie
      I like tacos … a lot
    • 3.
      • File System refresher
      • Basic Issues
        • Naming / Transparency
        • Caching
        • Coherence
        • Security
        • Performance
      • Case Studies
        • NFS v3 - v4
        • Lustre
        • AFS 2.0
    • 4.
      • What is a file system?
      • Why have a file system?
      Mmmm, refreshing File Systems
    • 5.
        • Must have
          • Name e.g. “/home/sosa/DFSSlides.ppt”
          • Data – some structured sequence of bytes
        • Tend to also have
          • Size
          • Protection Information
          • Non-symbolic identifier
          • Location
          • Times, etc
    • 6.
      • A container abstraction to help organize files
        • Generally hierarchical (tree) structure
        • Often a special type of file
      • Directories have a
        • Name
        • Files and directories (if hierarchical) within them
      A large container for tourists
    • 7.
      • Two approaches to sharing files
      • Copy-based
        • Application explicitly copies files between machines
        • Examples: UUCP, FTP, gridFTP, {.*} FTP, Rcp, Scp, etc.
      • Access transparency – i.e. Distributed File Systems
      Sharing is caring
    • 8.
      • Basic idea
        • Find a copy
          • naming is based on machine name of source (viper.cs.virginia.edu), user id, and path
        • Transfer the file to the local file system
          • scp grimshaw@viper.cs.virginia.edu:fred.txt .
        • Read/write
        • Copy back if modified
      • Pros ans Cons?
    • 9.
      • Pros
        • Semantics are clear
        • No OS/library modification
      • Cons?
        • Deal with model
        • Have to copy whole file
        • Inconsistencies
        • Inconsistent copies all over the place
        • Others?
    • 10.
      • Mechanism to access remote the same as local (i.e. through the file system hierarchy)
      • Why is this better?
      • … enter Distributed File Systems
    • 11.
      • A Distributed File System is a file system that may have files on more than one machine
      • Distributed File Systems take many forms
        • Network File Systems
        • Parallel File Systems
        • Access Transparent Distributed File Systems
      • Why distribute?
    • 12.
      • Sharing files with other users
        • Others can access your files
        • You can have access to files you wouldn’t regularly have access to
      • Keeping files available for yourself on more than one computer
      • Small amount of local resources
      • High failure rate of local resources
      • Can eliminate version problems (same file copied around with local edits)
    • 13.
      • Naming
      • Performance
      • Caching
      • Consistency Semantics
      • Fault Tolerance
      • Scalability
    • 14.
      • What does a DFS look like to the user?
        • Mount-like protocol .e.g /../mntPointToBobsSharedFolder/file.txt
        • Unified namespace. Everything looks like they’re on the same namespace
      • Pros and Cons?
    • 15.
      • Location transparency
        • Name does not hint at physical location
        • Mount points are not transparent
      • Location Independence
        • File name does not need to be changed when the file’s physical storage location changes
      • Independence without transparency?
    • 16.
      • Generally trade-off the benefits of DFS’s with some performance hits
        • How much depends on workload
        • Always look at workload to figure out what mechanisms to use
      • What are some ways to improve performance?
    • 17.
      • Single architectural feature that contributes most to performance in a DFS!!!
      • Single greatest cause of heartache for programmers of DFS’s
        • Maintaining consistency semantics more difficult
        • Has a large potential impact on scalability
    • 18.
      • Size of the cached units of data
        • Larger sizes make more efficient use of the network –spacial locality, latency
        • Whole files simply semantics but can’t store very large files locally
        • Small files
      • Who does what
        • Push vs Pull
        • Important for maintaining consistency
    • 19.
      • Different DFS’s have different consistency semantics
        • UNIX semantics
        • On Close semantics
        • Timeout semantics (at least x-second up-to date)
      • Pro’s / Con’s?
    • 20.
      • Can replicate
        • Fault Tolerance
        • Performance
      • Replication is inherently location-opaque i.e. we need location independence in naming
      • Different forms of replication mechanisms, different consistency semantics
        • Tradeoffs, tradeoffs, tradeoffs
    • 21.
      • Mount-based DFS
        • NFS version 3
        • Others include SMB, CIFS, NFS version 4
      • Parallel DFS
        • Lustre
        • Others include HDFS, Google File System, etc
      • Non-Parallel Unified Namespace DFS’s
        • Sprite
        • AFS version 2.0 (basis for many other DFS’s)
          • Coda
          • AFS 3.0
    • 22.
    • 23.
      • Most commonly used DFS ever!
      • Goals
        • Machine & OS Independent
        • Crash Recovery
        • Transparent Access
        • “ Reasonable” Performance
      • Design
        • All are client and servers
        • RPC (on top of UDP v.1, v.2+ on TCP)
          • Open Network Computing Remote Procedure Call
          • External Data Representation (XDR)
        • Stateless Protocol
    • 24.
    • 25.
      • Client sends path name to server with request to mount
      • If path is legal and exported, server returns file handle
        • Contains FS type, disk, i-node number of directory, security info
        • Subsequent accesses use file handle
      • Mount can be either at boot or automount
        • Automount: Directories mounted on-use
        • Why helpful?
      • Mount only affects client view
    • 26.
      • Mounting (part of) a remote file system in NFS.
    • 27.
      • Mounting nested directories from multiple servers in NFS.
    • 28.
      • Supports directory and file access via remote procedure calls ( RPC s)
      • All UNIX system calls supported other than open & close
      • Open and close are intentionally not supported
        • For a read , client sends lookup message to server
        • Lookup returns file handle but does not copy info in internal system tables
        • Subsequently, read contains file handle, offset and num bytes
        • Each message is self-contained – flexible, but?
    • 29.
      • Reading data from a file in NFS version 3.
      • Reading data using a compound procedure in version 4.
    • 30.
      • Some general mandatory file attributes in NFS.
      Attribute Description TYPE The type of the file (regular, directory, symbolic link) SIZE The length of the file in bytes CHANGE Indicator for a client to see if and/or when the file has changed FSID Server-unique identifier of the file's file system
    • 31.
      • Some general recommended file attributes.
      Attribute Description ACL an access control list associated with the file FILEHANDLE The server-provided file handle of this file FILEID A file-system unique identifier for this file FS_LOCATIONS Locations in the network where this file system may be found OWNER The character-string name of the file's owner TIME_ACCESS Time when the file data were last accessed TIME_MODIFY Time when the file data were last modified TIME_CREATE Time when the file was created
    • 32.
      • All communication done in the clear
      • Client sends userid, group id of request NFS server
      • Discuss
    • 33.
      • Consistency semantics are dirty
        • Checks non-dirty items every 5 seconds
        • Things marked dirty flushed within 30 seconds
      • Performance under load is horrible, why?
      • Cross-mount hell - paths to files different on different machines
      • ID mismatch between domains
    • 34.
      • Goals
        • Improved Access and good performance on the Internet
        • Better Scalability
        • Strong Security
        • Cross-platform interoperability and ease to extend
    • 35.
      • Stateful Protocol (Open + Close)
      • Compound Operations (Fully utilize bandwidth)
      • Lease-based Locks (Locking built-in)
      • “ Delegation” to clients (Less work for the server)
      • Close-Open Cache Consistency (Timeouts still for attributes and directories)
      • Better security
    • 36.
      • Borrowed model from CIFS (Common Internet File System) see MS
      • Open/Close
        • Opens do lookup, create, and lock all in one (what a deal)!
        • Locks / delegation (explained later) released on file close
        • Always a notion of a “current file handle” i.e. see pwd
    • 37.
      • Problem: Normal filesystem semantics have too many RPC’s (boo)
      • Solution: Group many calls into one call (yay)
      • Semantics
        • Run sequentially
        • Fails on first failure
        • Returns status of each individual RPC in the compound response (either to failure or success)
      Compound Kitty
    • 38.
      • Both byte-range and file locks
      • Heartbeats keep locks alive (renew lock)
      • If server fails, waits at least the agreed upon lease time (constant) before accepting any other lock requests
      • If client fails, locks are released by server at the end of lease period
    • 39.
      • Tells client no one else has the file
      • Client exposes callbacks
    • 40.
      • Any opens that happen after a close finishes are consistent with the information with the last close
      • Last close wins the competition
    • 41.
      • Uses the GSS-API framework
      • All id’s are formed with
        • [email_address]
        • [email_address]
      • Every implementation must have Kerberos v5
      • Every implementation must have LIPKey
      Meow
    • 42.
      • Replication / Migration mechanism added
        • Special error messages to indicate migration
        • Special attribute for both replication and migration that gives the location of the other / new location
        • May have read-only replicas
    • 43.
    • 44.
      • People don’t like to move
      • Requires Kerberos (the death of many good distributed file systems
      • Looks just like V3 to end-user and V3 is good enough 
    • 45.
    • 46.
      • Need for a file system for large clusters that has the following attributes
        • Highly scalable > 10,000 nodes
        • Provide petabytes of storage
        • High throughput (100 GB/sec)
      • Datacenters have different needs so we need a general-purpose back-end file system
    • 47.
      • Open-source object-based cluster file system
      • Fully compliant with POSIX
      • Features (i.e. what I will discuss)
        • Object Protocols
        • Intent-based Locking
        • Adaptive Locking Policies
        • Aggressive Caching
    • 48.
    • 49.
    • 50.
    • 51.
    • 52.
    • 53.
      • Policy depends on context
      • Mode 1: Performing operations on something they only mostly use (e.g. /home/username)
      • Mode 2: Performing operations on a highly contentious Resource (e.g. /tmp)
      • DLM capable of granting locks on an entire subtree and whole files
    • 54.
      • POSIX
      • Keeps local journal of updates for locked files
        • One per file operation
        • Hard linked files get special treatment with subtree locks
      • Lock revoked -> updates flushed and replayed
      • Use subtree change times to validate cache entries
      • Additionally features collaborative caching -> referrals to other dedicated cache service
    • 55. Security
      • Supports GSS-API
        • Supports (does not require) Kerberos
        • Supports PKI mechanisms
      • Did not want to be tied down to one mechanism
    • 56.
    • 57.
      • Named after Andrew Carnegie and Andrew Mellon
        • Transarc Corp. and then IBM took development of AFS
        • In 2000 IBM made OpenAFS available as open source
      • Goals
        • Large scale (thousands of servers and clients)
        • User mobility
        • Scalability
        • Heterogeneity
        • Security
        • Location transparency
        • Availability
    • 58.
      • Features:
        • Uniform name space
        • Location independent file sharing
        • Client side caching with cache consistency
        • Secure authentication via Kerberos
        • High availability through automatic switchover of replicas
        • Scalability to span 5000 workstations
    • 59.
      • Based on the upload/download model
        • Clients download and cache files
        • Server keeps track of clients that cache the file
        • Clients upload files at end of session
      • Whole file caching is key
        • Later amended to block operations (v3)
        • Simple and effective
      • Kerberos for Security
      • AFS servers are stateful
        • Keep track of clients that have cached files
        • Recall files that have been modified
    • 60.
      • Clients have partitioned name space:
        • Local name space and shared name space
        • Cluster of dedicated servers (Vice) present shared name space
        • Clients run Virtue protocol to communicate with Vice
    • 61.
    • 62.
      • AFS’s storage is arranged in volumes
        • Usually associated with files of a particular client
      • AFS dir entry maps vice files/dirs to a 96-bit fid
        • Volume number
        • Vnode number: index into i-node array of a volume
        • Uniquifier: allows reuse of vnode numbers
      • Fids are location transparent
        • File movements do not invalidate fids
      • Location information kept in volume-location database
        • Volumes migrated to balance available disk space, utilization
        • Volume movement is atomic; operation aborted on server crash
    • 63. User process –> open file F The kernel resolves that it’s a Vice file -> passes it to Venus D is in the cache & has callback – > use it without any network communication D is in cache but has no callback – > contact the appropriate server for a new copy; establish callback D is not in cache – > fetch it from the server ; establish callback File F is identified -> create a current cache copy Venus returns to the kernel which opens F and returns its handle to the process
    • 64.
      • AFS caches entire files from servers
        • Client interacts with servers only during open and close
      • OS on client intercepts calls, passes to Venus
        • Venus is a client process that caches files from servers
        • Venus contacts Vice only on open and close
        • Reads and writes bypass Venus
      • Works due to callback :
        • Server updates state to record caching
        • Server notifies client before allowing another client to modify
        • Clients lose their callback when someone writes the file
      • Venus caches dirs and symbolic links for path translation
    • 65.
      • The use of local copies when opening a session in Coda.
    • 66.
      • A descendent of AFS v2 (AFS v3 went another way with large chunk caching)
      • Goals
        • More resilient to server and network failures
        • Constant Data Availability
        • Portable computing
    • 67.
      • Keeps whole file caching, callbacks, end-to-end encryption
      • Adds full server replication
      • General Update Protocol
        • Known as Coda Optimistic Protocol
        • COP1 (first phase) performs actual semantic operation to servers (using multicast if available)
        • COP2 sends a data structure called an update set which summarizes the client’s knowledge. These messages are piggybacked on later COP1’s
    • 68.
      • Disconnected Operation (KEY)
        • Hoarding
          • Periodically reevaluates which objects merit retention in the cache (hoard walking)
          • Relies on both implicit and a lot of explicit info (profiles etc)
        • Emulating i.e. maintaining a replay log
        • Reintegration – re-play replay log
      • Conflict Resolution
        • Gives repair tool
        • Log to give to user to manually fix issue
    • 69.
      • The state-transition diagram of a Coda client with respect to a volume.
    • 70.
      • AFS deployments in academia and government (100’s)
      • Security model required Kerberos
        • Many organizations not willing to make the costly switch
      • AFS (but not coda) was not integrated into Unix FS. Separate “ls”, different – though similar – API
      • Session semantics not appropriate for many applications
    • 71.
      • Goals
        • Efficient use of large main memories
        • Support for multiprocessor workstations
        • Efficient network communication
        • Diskless Operation
        • Exact Emulation of UNIX FS semantics
      • Location transparent UNIX FS
    • 72.
      • Naming
        • Local prefix table which maps path-name prefixes to servers
        • Cached locations
        • Otherwise there is location embedded in remote stubs in the tree hierarchy
      • Caching
        • Needs sequential consistency
        • If one client wants to write, disables caching on all open clients. Assumes this isn’t very bad since this doesn’t happen often
      • No security between kernels. All over trusted network
    • 73.
      • The best way to implement something depends very highly on the goals you want to achieve
      • Always start with goals before deciding on consistency semantics