03-10-09 Some slides are taken from  Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc
<ul><li>3 rd  year graduate student working with Professor Grimshaw </li></ul><ul><li>Interests lie in Operating Systems, ...
<ul><li>File System refresher </li></ul><ul><li>Basic Issues </li></ul><ul><ul><li>Naming / Transparency </li></ul></ul><u...
<ul><li>What is a file system? </li></ul><ul><li>Why have a file system? </li></ul>Mmmm, refreshing File Systems
<ul><ul><li>Must have </li></ul></ul><ul><ul><ul><li>Name e.g. “/home/sosa/DFSSlides.ppt” </li></ul></ul></ul><ul><ul><ul>...
<ul><li>A container abstraction to help organize files </li></ul><ul><ul><li>Generally hierarchical (tree) structure </li>...
<ul><li>Two approaches to sharing files </li></ul><ul><li>Copy-based </li></ul><ul><ul><li>Application explicitly copies f...
<ul><li>Basic idea </li></ul><ul><ul><li>Find a copy </li></ul></ul><ul><ul><ul><li>naming is based on machine name of sou...
<ul><li>Pros </li></ul><ul><ul><li>Semantics are clear </li></ul></ul><ul><ul><li>No OS/library modification </li></ul></u...
<ul><li>Mechanism to access remote the same as local (i.e. through the file system hierarchy) </li></ul><ul><li>Why is thi...
<ul><li>A Distributed File System is a file system that may have files on more than one machine </li></ul><ul><li>Distribu...
<ul><li>Sharing files with other users </li></ul><ul><ul><li>Others can access your files </li></ul></ul><ul><ul><li>You c...
<ul><li>Naming </li></ul><ul><li>Performance </li></ul><ul><li>Caching </li></ul><ul><li>Consistency Semantics </li></ul><...
<ul><li>What does a DFS look like to the user? </li></ul><ul><ul><li>Mount-like protocol .e.g /../mntPointToBobsSharedFold...
<ul><li>Location transparency  </li></ul><ul><ul><li>Name does not hint at physical location </li></ul></ul><ul><ul><li>Mo...
<ul><li>Generally trade-off the benefits of DFS’s with some performance hits </li></ul><ul><ul><li>How much depends on wor...
<ul><li>Single architectural feature that contributes most to performance in a DFS!!! </li></ul><ul><li>Single greatest ca...
<ul><li>Size of the cached units of data </li></ul><ul><ul><li>Larger sizes make more efficient use of the network –spacia...
<ul><li>Different DFS’s have different consistency semantics </li></ul><ul><ul><li>UNIX semantics </li></ul></ul><ul><ul><...
<ul><li>Can replicate </li></ul><ul><ul><li>Fault Tolerance </li></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><li>...
<ul><li>Mount-based DFS </li></ul><ul><ul><li>NFS version 3 </li></ul></ul><ul><ul><li>Others include SMB, CIFS,  NFS vers...
<ul><li>Most commonly used DFS ever! </li></ul><ul><li>Goals </li></ul><ul><ul><li>Machine & OS Independent </li></ul></ul...
<ul><li>Client sends path name to server with request to mount </li></ul><ul><li>If path is legal and exported, server ret...
<ul><li>Mounting (part of) a remote file system in NFS. </li></ul>
<ul><li>Mounting nested directories from multiple servers in NFS. </li></ul>
<ul><li>Supports directory and file access via  remote procedure calls  ( RPC s) </li></ul><ul><li>All UNIX system calls s...
<ul><li>Reading data from a file in NFS version 3. </li></ul><ul><li>Reading data using a compound procedure in version 4....
<ul><li>Some general mandatory file attributes in NFS. </li></ul>Attribute Description TYPE The type of the file (regular,...
<ul><li>Some general recommended file attributes. </li></ul>Attribute Description ACL an access control list associated wi...
<ul><li>All communication done in the clear </li></ul><ul><li>Client sends userid, group id of request NFS server </li></u...
<ul><li>Consistency semantics are dirty </li></ul><ul><ul><li>Checks non-dirty items every 5 seconds </li></ul></ul><ul><u...
<ul><li>Goals </li></ul><ul><ul><li>Improved Access and good performance on the Internet </li></ul></ul><ul><ul><li>Better...
<ul><li>Stateful Protocol (Open + Close) </li></ul><ul><li>Compound Operations (Fully utilize bandwidth) </li></ul><ul><li...
<ul><li>Borrowed model from CIFS (Common Internet File System) see MS </li></ul><ul><li>Open/Close </li></ul><ul><ul><li>O...
<ul><li>Problem:  Normal filesystem semantics have too many RPC’s (boo) </li></ul><ul><li>Solution:  Group many calls into...
<ul><li>Both byte-range and file locks </li></ul><ul><li>Heartbeats keep locks alive (renew lock) </li></ul><ul><li>If ser...
<ul><li>Tells client no one else has the file </li></ul><ul><li>Client exposes callbacks </li></ul>
<ul><li>Any opens that happen after a close finishes are consistent with the information with the last close </li></ul><ul...
<ul><li>Uses the GSS-API framework </li></ul><ul><li>All id’s are formed with </li></ul><ul><ul><li>[email_address] </li><...
<ul><li>Replication / Migration mechanism added </li></ul><ul><ul><li>Special error messages to indicate migration </li></...
<ul><li>People don’t like to move  </li></ul><ul><li>Requires Kerberos (the death of many good  distributed file systems <...
<ul><li>Need for a file system for large clusters that has the following attributes </li></ul><ul><ul><li>Highly scalable ...
<ul><li>Open-source object-based cluster file system  </li></ul><ul><li>Fully compliant with POSIX </li></ul><ul><li>Featu...
<ul><li>Policy depends on context </li></ul><ul><li>Mode 1:  Performing operations on something they only mostly use (e.g....
<ul><li>POSIX  </li></ul><ul><li>Keeps local journal of updates for locked files </li></ul><ul><ul><li>One per file operat...
Security <ul><li>Supports GSS-API </li></ul><ul><ul><li>Supports (does not require) Kerberos </li></ul></ul><ul><ul><li>Su...
<ul><li>Named after Andrew Carnegie and Andrew Mellon </li></ul><ul><ul><li>Transarc Corp. and then IBM took development o...
<ul><li>Features: </li></ul><ul><ul><li>Uniform name space </li></ul></ul><ul><ul><li>Location independent file sharing </...
<ul><li>Based on the upload/download model </li></ul><ul><ul><li>Clients download and cache  files </li></ul></ul><ul><ul>...
<ul><li>Clients have partitioned name space: </li></ul><ul><ul><li>Local name space and shared name space </li></ul></ul><...
<ul><li>AFS’s storage is arranged in volumes </li></ul><ul><ul><li>Usually associated with files of a particular client </...
User process –>  open file F The kernel resolves that it’s a Vice file ->  passes it to Venus  D is in the cache &   has c...
<ul><li>AFS caches entire files from servers </li></ul><ul><ul><li>Client interacts with servers only during open and clos...
<ul><li>The use of  local copies when opening a session in Coda. </li></ul>
<ul><li>A descendent of AFS v2 (AFS v3 went another way with large chunk caching) </li></ul><ul><li>Goals </li></ul><ul><u...
<ul><li>Keeps whole file caching, callbacks, end-to-end encryption </li></ul><ul><li>Adds full server replication </li></u...
<ul><li>Disconnected Operation (KEY) </li></ul><ul><ul><li>Hoarding </li></ul></ul><ul><ul><ul><li>Periodically reevaluate...
<ul><li>The state-transition diagram of a Coda client with respect to a volume. </li></ul>
<ul><li>AFS deployments in academia and government (100’s) </li></ul><ul><li>Security model required Kerberos </li></ul><u...
<ul><li>Goals </li></ul><ul><ul><li>Efficient use of large main memories </li></ul></ul><ul><ul><li>Support for multiproce...
<ul><li>Naming </li></ul><ul><ul><li>Local prefix table which maps path-name prefixes to servers </li></ul></ul><ul><ul><l...
<ul><li>The best way to implement something depends very highly on the goals you want to achieve </li></ul><ul><li>Always ...
Distributed File Systems
Upcoming SlideShare
Loading in...5
×

Distributed File Systems

4,584

Published on

Distributed file systems lecture I gave for Andrew Grimshaw's Distributed systems course in the Spring of 2009

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,584
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
286
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide
  • File is a named collection of related information that is recorded on some permanent storage
  • Access transparency – Use same mechanism to access file whether it is local or remote, i.e., map remote files into local file system name space
  • Harder to mark failure Performance issues – caching etc? Can’t really do
  • COTS Legacy apps Does it really matter where it is? One copy versus many copies Etc
  • Lampson –hints for push model
  • A server exports one or more of its directories to remote clients Clients access exported directories by mounting them The contents are then accessed as if they were local
  • Pros: server is stateless, i.e. no state about open files Cons: Locking is difficult, no concurrency control
  • No consistency semantics – things marked dirty flushed within 30 seconds Checks non-dirty items every 5 seconds Bad performance with heavy load etc
  • User process wants to open a file with a pathname P The kernel resolves that it’s a Vice file &amp; passes it to Venus on that workstation One of the LWP’s uses the cache to examine each directory component D of P… When processing a pathname component, Venus identifies the server to be contacted by examining the volume field of the Fid. Authentication, protection checking, and network failures complicate the matter considerably.
  • Distributed File Systems

    1. 1. 03-10-09 Some slides are taken from Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc
    2. 2. <ul><li>3 rd year graduate student working with Professor Grimshaw </li></ul><ul><li>Interests lie in Operating Systems, Distributed Systems, and more recently Cloud Computing </li></ul><ul><li>Also </li></ul><ul><ul><li>Trumpet </li></ul></ul><ul><ul><li>Sporty things </li></ul></ul><ul><ul><li>Hardware Junkie </li></ul></ul>I like tacos … a lot
    3. 3. <ul><li>File System refresher </li></ul><ul><li>Basic Issues </li></ul><ul><ul><li>Naming / Transparency </li></ul></ul><ul><ul><li>Caching </li></ul></ul><ul><ul><li>Coherence </li></ul></ul><ul><ul><li>Security </li></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><li>Case Studies </li></ul><ul><ul><li>NFS v3 - v4 </li></ul></ul><ul><ul><li>Lustre </li></ul></ul><ul><ul><li>AFS 2.0 </li></ul></ul>
    4. 4. <ul><li>What is a file system? </li></ul><ul><li>Why have a file system? </li></ul>Mmmm, refreshing File Systems
    5. 5. <ul><ul><li>Must have </li></ul></ul><ul><ul><ul><li>Name e.g. “/home/sosa/DFSSlides.ppt” </li></ul></ul></ul><ul><ul><ul><li>Data – some structured sequence of bytes </li></ul></ul></ul><ul><ul><li>Tend to also have </li></ul></ul><ul><ul><ul><li>Size </li></ul></ul></ul><ul><ul><ul><li>Protection Information </li></ul></ul></ul><ul><ul><ul><li>Non-symbolic identifier </li></ul></ul></ul><ul><ul><ul><li>Location </li></ul></ul></ul><ul><ul><ul><li>Times, etc </li></ul></ul></ul>
    6. 6. <ul><li>A container abstraction to help organize files </li></ul><ul><ul><li>Generally hierarchical (tree) structure </li></ul></ul><ul><ul><li>Often a special type of file </li></ul></ul><ul><li>Directories have a </li></ul><ul><ul><li>Name </li></ul></ul><ul><ul><li>Files and directories (if hierarchical) within them </li></ul></ul>A large container for tourists
    7. 7. <ul><li>Two approaches to sharing files </li></ul><ul><li>Copy-based </li></ul><ul><ul><li>Application explicitly copies files between machines </li></ul></ul><ul><ul><li>Examples: UUCP, FTP, gridFTP, {.*} FTP, Rcp, Scp, etc. </li></ul></ul><ul><li>Access transparency – i.e. Distributed File Systems </li></ul>Sharing is caring
    8. 8. <ul><li>Basic idea </li></ul><ul><ul><li>Find a copy </li></ul></ul><ul><ul><ul><li>naming is based on machine name of source (viper.cs.virginia.edu), user id, and path </li></ul></ul></ul><ul><ul><li>Transfer the file to the local file system </li></ul></ul><ul><ul><ul><li>scp grimshaw@viper.cs.virginia.edu:fred.txt . </li></ul></ul></ul><ul><ul><li>Read/write </li></ul></ul><ul><ul><li>Copy back if modified </li></ul></ul><ul><li>Pros ans Cons? </li></ul>
    9. 9. <ul><li>Pros </li></ul><ul><ul><li>Semantics are clear </li></ul></ul><ul><ul><li>No OS/library modification </li></ul></ul><ul><li>Cons? </li></ul><ul><ul><li>Deal with model </li></ul></ul><ul><ul><li>Have to copy whole file </li></ul></ul><ul><ul><li>Inconsistencies </li></ul></ul><ul><ul><li>Inconsistent copies all over the place </li></ul></ul><ul><ul><li>Others? </li></ul></ul>
    10. 10. <ul><li>Mechanism to access remote the same as local (i.e. through the file system hierarchy) </li></ul><ul><li>Why is this better? </li></ul><ul><li>… enter Distributed File Systems </li></ul>
    11. 11. <ul><li>A Distributed File System is a file system that may have files on more than one machine </li></ul><ul><li>Distributed File Systems take many forms </li></ul><ul><ul><li>Network File Systems </li></ul></ul><ul><ul><li>Parallel File Systems </li></ul></ul><ul><ul><li>Access Transparent Distributed File Systems </li></ul></ul><ul><li>Why distribute? </li></ul>
    12. 12. <ul><li>Sharing files with other users </li></ul><ul><ul><li>Others can access your files </li></ul></ul><ul><ul><li>You can have access to files you wouldn’t regularly have access to </li></ul></ul><ul><li>Keeping files available for yourself on more than one computer </li></ul><ul><li>Small amount of local resources </li></ul><ul><li>High failure rate of local resources </li></ul><ul><li>Can eliminate version problems (same file copied around with local edits) </li></ul>
    13. 13. <ul><li>Naming </li></ul><ul><li>Performance </li></ul><ul><li>Caching </li></ul><ul><li>Consistency Semantics </li></ul><ul><li>Fault Tolerance </li></ul><ul><li>Scalability </li></ul>
    14. 14. <ul><li>What does a DFS look like to the user? </li></ul><ul><ul><li>Mount-like protocol .e.g /../mntPointToBobsSharedFolder/file.txt </li></ul></ul><ul><ul><li>Unified namespace. Everything looks like they’re on the same namespace </li></ul></ul><ul><li>Pros and Cons? </li></ul>
    15. 15. <ul><li>Location transparency </li></ul><ul><ul><li>Name does not hint at physical location </li></ul></ul><ul><ul><li>Mount points are not transparent </li></ul></ul><ul><li>Location Independence </li></ul><ul><ul><li>File name does not need to be changed when the file’s physical storage location changes </li></ul></ul><ul><li>Independence without transparency? </li></ul>
    16. 16. <ul><li>Generally trade-off the benefits of DFS’s with some performance hits </li></ul><ul><ul><li>How much depends on workload </li></ul></ul><ul><ul><li>Always look at workload to figure out what mechanisms to use </li></ul></ul><ul><li>What are some ways to improve performance? </li></ul>
    17. 17. <ul><li>Single architectural feature that contributes most to performance in a DFS!!! </li></ul><ul><li>Single greatest cause of heartache for programmers of DFS’s </li></ul><ul><ul><li>Maintaining consistency semantics more difficult </li></ul></ul><ul><ul><li>Has a large potential impact on scalability </li></ul></ul>
    18. 18. <ul><li>Size of the cached units of data </li></ul><ul><ul><li>Larger sizes make more efficient use of the network –spacial locality, latency </li></ul></ul><ul><ul><li>Whole files simply semantics but can’t store very large files locally </li></ul></ul><ul><ul><li>Small files </li></ul></ul><ul><li>Who does what </li></ul><ul><ul><li>Push vs Pull </li></ul></ul><ul><ul><li>Important for maintaining consistency </li></ul></ul>
    19. 19. <ul><li>Different DFS’s have different consistency semantics </li></ul><ul><ul><li>UNIX semantics </li></ul></ul><ul><ul><li>On Close semantics </li></ul></ul><ul><ul><li>Timeout semantics (at least x-second up-to date) </li></ul></ul><ul><li>Pro’s / Con’s? </li></ul>
    20. 20. <ul><li>Can replicate </li></ul><ul><ul><li>Fault Tolerance </li></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><li>Replication is inherently location-opaque i.e. we need location independence in naming </li></ul><ul><li>Different forms of replication mechanisms, different consistency semantics </li></ul><ul><ul><li>Tradeoffs, tradeoffs, tradeoffs </li></ul></ul>
    21. 21. <ul><li>Mount-based DFS </li></ul><ul><ul><li>NFS version 3 </li></ul></ul><ul><ul><li>Others include SMB, CIFS, NFS version 4 </li></ul></ul><ul><li>Parallel DFS </li></ul><ul><ul><li>Lustre </li></ul></ul><ul><ul><li>Others include HDFS, Google File System, etc </li></ul></ul><ul><li>Non-Parallel Unified Namespace DFS’s </li></ul><ul><ul><li>Sprite </li></ul></ul><ul><ul><li>AFS version 2.0 (basis for many other DFS’s) </li></ul></ul><ul><ul><ul><li>Coda </li></ul></ul></ul><ul><ul><ul><li>AFS 3.0 </li></ul></ul></ul>
    22. 22.
    23. 23. <ul><li>Most commonly used DFS ever! </li></ul><ul><li>Goals </li></ul><ul><ul><li>Machine & OS Independent </li></ul></ul><ul><ul><li>Crash Recovery </li></ul></ul><ul><ul><li>Transparent Access </li></ul></ul><ul><ul><li>“ Reasonable” Performance </li></ul></ul><ul><li>Design </li></ul><ul><ul><li>All are client and servers </li></ul></ul><ul><ul><li>RPC (on top of UDP v.1, v.2+ on TCP) </li></ul></ul><ul><ul><ul><li>Open Network Computing Remote Procedure Call </li></ul></ul></ul><ul><ul><ul><li>External Data Representation (XDR) </li></ul></ul></ul><ul><ul><li>Stateless Protocol </li></ul></ul>
    24. 24.
    25. 25. <ul><li>Client sends path name to server with request to mount </li></ul><ul><li>If path is legal and exported, server returns file handle </li></ul><ul><ul><li>Contains FS type, disk, i-node number of directory, security info </li></ul></ul><ul><ul><li>Subsequent accesses use file handle </li></ul></ul><ul><li>Mount can be either at boot or automount </li></ul><ul><ul><li>Automount: Directories mounted on-use </li></ul></ul><ul><ul><li>Why helpful? </li></ul></ul><ul><li>Mount only affects client view </li></ul>
    26. 26. <ul><li>Mounting (part of) a remote file system in NFS. </li></ul>
    27. 27. <ul><li>Mounting nested directories from multiple servers in NFS. </li></ul>
    28. 28. <ul><li>Supports directory and file access via remote procedure calls ( RPC s) </li></ul><ul><li>All UNIX system calls supported other than open & close </li></ul><ul><li>Open and close are intentionally not supported </li></ul><ul><ul><li>For a read , client sends lookup message to server </li></ul></ul><ul><ul><li>Lookup returns file handle but does not copy info in internal system tables </li></ul></ul><ul><ul><li>Subsequently, read contains file handle, offset and num bytes </li></ul></ul><ul><ul><li>Each message is self-contained – flexible, but? </li></ul></ul>
    29. 29. <ul><li>Reading data from a file in NFS version 3. </li></ul><ul><li>Reading data using a compound procedure in version 4. </li></ul>
    30. 30. <ul><li>Some general mandatory file attributes in NFS. </li></ul>Attribute Description TYPE The type of the file (regular, directory, symbolic link) SIZE The length of the file in bytes CHANGE Indicator for a client to see if and/or when the file has changed FSID Server-unique identifier of the file's file system
    31. 31. <ul><li>Some general recommended file attributes. </li></ul>Attribute Description ACL an access control list associated with the file FILEHANDLE The server-provided file handle of this file FILEID A file-system unique identifier for this file FS_LOCATIONS Locations in the network where this file system may be found OWNER The character-string name of the file's owner TIME_ACCESS Time when the file data were last accessed TIME_MODIFY Time when the file data were last modified TIME_CREATE Time when the file was created
    32. 32. <ul><li>All communication done in the clear </li></ul><ul><li>Client sends userid, group id of request NFS server </li></ul><ul><li>Discuss </li></ul>
    33. 33. <ul><li>Consistency semantics are dirty </li></ul><ul><ul><li>Checks non-dirty items every 5 seconds </li></ul></ul><ul><ul><li>Things marked dirty flushed within 30 seconds </li></ul></ul><ul><li>Performance under load is horrible, why? </li></ul><ul><li>Cross-mount hell - paths to files different on different machines </li></ul><ul><li>ID mismatch between domains </li></ul>
    34. 34. <ul><li>Goals </li></ul><ul><ul><li>Improved Access and good performance on the Internet </li></ul></ul><ul><ul><li>Better Scalability </li></ul></ul><ul><ul><li>Strong Security </li></ul></ul><ul><ul><li>Cross-platform interoperability and ease to extend </li></ul></ul>
    35. 35. <ul><li>Stateful Protocol (Open + Close) </li></ul><ul><li>Compound Operations (Fully utilize bandwidth) </li></ul><ul><li>Lease-based Locks (Locking built-in) </li></ul><ul><li>“ Delegation” to clients (Less work for the server) </li></ul><ul><li>Close-Open Cache Consistency (Timeouts still for attributes and directories) </li></ul><ul><li>Better security </li></ul>
    36. 36. <ul><li>Borrowed model from CIFS (Common Internet File System) see MS </li></ul><ul><li>Open/Close </li></ul><ul><ul><li>Opens do lookup, create, and lock all in one (what a deal)! </li></ul></ul><ul><ul><li>Locks / delegation (explained later) released on file close </li></ul></ul><ul><ul><li>Always a notion of a “current file handle” i.e. see pwd </li></ul></ul>
    37. 37. <ul><li>Problem: Normal filesystem semantics have too many RPC’s (boo) </li></ul><ul><li>Solution: Group many calls into one call (yay) </li></ul><ul><li>Semantics </li></ul><ul><ul><li>Run sequentially </li></ul></ul><ul><ul><li>Fails on first failure </li></ul></ul><ul><ul><li>Returns status of each individual RPC in the compound response (either to failure or success) </li></ul></ul>Compound Kitty
    38. 38. <ul><li>Both byte-range and file locks </li></ul><ul><li>Heartbeats keep locks alive (renew lock) </li></ul><ul><li>If server fails, waits at least the agreed upon lease time (constant) before accepting any other lock requests </li></ul><ul><li>If client fails, locks are released by server at the end of lease period </li></ul>
    39. 39. <ul><li>Tells client no one else has the file </li></ul><ul><li>Client exposes callbacks </li></ul>
    40. 40. <ul><li>Any opens that happen after a close finishes are consistent with the information with the last close </li></ul><ul><li>Last close wins the competition </li></ul>
    41. 41. <ul><li>Uses the GSS-API framework </li></ul><ul><li>All id’s are formed with </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><li>Every implementation must have Kerberos v5 </li></ul><ul><li>Every implementation must have LIPKey </li></ul>Meow
    42. 42. <ul><li>Replication / Migration mechanism added </li></ul><ul><ul><li>Special error messages to indicate migration </li></ul></ul><ul><ul><li>Special attribute for both replication and migration that gives the location of the other / new location </li></ul></ul><ul><ul><li>May have read-only replicas </li></ul></ul>
    43. 43.
    44. 44. <ul><li>People don’t like to move </li></ul><ul><li>Requires Kerberos (the death of many good distributed file systems </li></ul><ul><li>Looks just like V3 to end-user and V3 is good enough  </li></ul>
    45. 45.
    46. 46. <ul><li>Need for a file system for large clusters that has the following attributes </li></ul><ul><ul><li>Highly scalable > 10,000 nodes </li></ul></ul><ul><ul><li>Provide petabytes of storage </li></ul></ul><ul><ul><li>High throughput (100 GB/sec) </li></ul></ul><ul><li>Datacenters have different needs so we need a general-purpose back-end file system </li></ul>
    47. 47. <ul><li>Open-source object-based cluster file system </li></ul><ul><li>Fully compliant with POSIX </li></ul><ul><li>Features (i.e. what I will discuss) </li></ul><ul><ul><li>Object Protocols </li></ul></ul><ul><ul><li>Intent-based Locking </li></ul></ul><ul><ul><li>Adaptive Locking Policies </li></ul></ul><ul><ul><li>Aggressive Caching </li></ul></ul>
    48. 48.
    49. 49.
    50. 50.
    51. 51.
    52. 52.
    53. 53. <ul><li>Policy depends on context </li></ul><ul><li>Mode 1: Performing operations on something they only mostly use (e.g. /home/username) </li></ul><ul><li>Mode 2: Performing operations on a highly contentious Resource (e.g. /tmp) </li></ul><ul><li>DLM capable of granting locks on an entire subtree and whole files </li></ul>
    54. 54. <ul><li>POSIX </li></ul><ul><li>Keeps local journal of updates for locked files </li></ul><ul><ul><li>One per file operation </li></ul></ul><ul><ul><li>Hard linked files get special treatment with subtree locks </li></ul></ul><ul><li>Lock revoked -> updates flushed and replayed </li></ul><ul><li>Use subtree change times to validate cache entries </li></ul><ul><li>Additionally features collaborative caching -> referrals to other dedicated cache service </li></ul>
    55. 55. Security <ul><li>Supports GSS-API </li></ul><ul><ul><li>Supports (does not require) Kerberos </li></ul></ul><ul><ul><li>Supports PKI mechanisms </li></ul></ul><ul><li>Did not want to be tied down to one mechanism </li></ul>
    56. 56.
    57. 57. <ul><li>Named after Andrew Carnegie and Andrew Mellon </li></ul><ul><ul><li>Transarc Corp. and then IBM took development of AFS </li></ul></ul><ul><ul><li>In 2000 IBM made OpenAFS available as open source </li></ul></ul><ul><li>Goals </li></ul><ul><ul><li>Large scale (thousands of servers and clients) </li></ul></ul><ul><ul><li>User mobility </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><li>Heterogeneity </li></ul></ul><ul><ul><li>Security </li></ul></ul><ul><ul><li>Location transparency </li></ul></ul><ul><ul><li>Availability </li></ul></ul>
    58. 58. <ul><li>Features: </li></ul><ul><ul><li>Uniform name space </li></ul></ul><ul><ul><li>Location independent file sharing </li></ul></ul><ul><ul><li>Client side caching with cache consistency </li></ul></ul><ul><ul><li>Secure authentication via Kerberos </li></ul></ul><ul><ul><li>High availability through automatic switchover of replicas </li></ul></ul><ul><ul><li>Scalability to span 5000 workstations </li></ul></ul>
    59. 59. <ul><li>Based on the upload/download model </li></ul><ul><ul><li>Clients download and cache files </li></ul></ul><ul><ul><li>Server keeps track of clients that cache the file </li></ul></ul><ul><ul><li>Clients upload files at end of session </li></ul></ul><ul><li>Whole file caching is key </li></ul><ul><ul><li>Later amended to block operations (v3) </li></ul></ul><ul><ul><li>Simple and effective </li></ul></ul><ul><li>Kerberos for Security </li></ul><ul><li>AFS servers are stateful </li></ul><ul><ul><li>Keep track of clients that have cached files </li></ul></ul><ul><ul><li>Recall files that have been modified </li></ul></ul>
    60. 60. <ul><li>Clients have partitioned name space: </li></ul><ul><ul><li>Local name space and shared name space </li></ul></ul><ul><ul><li>Cluster of dedicated servers (Vice) present shared name space </li></ul></ul><ul><ul><li>Clients run Virtue protocol to communicate with Vice </li></ul></ul>
    61. 61.
    62. 62. <ul><li>AFS’s storage is arranged in volumes </li></ul><ul><ul><li>Usually associated with files of a particular client </li></ul></ul><ul><li>AFS dir entry maps vice files/dirs to a 96-bit fid </li></ul><ul><ul><li>Volume number </li></ul></ul><ul><ul><li>Vnode number: index into i-node array of a volume </li></ul></ul><ul><ul><li>Uniquifier: allows reuse of vnode numbers </li></ul></ul><ul><li>Fids are location transparent </li></ul><ul><ul><li>File movements do not invalidate fids </li></ul></ul><ul><li>Location information kept in volume-location database </li></ul><ul><ul><li>Volumes migrated to balance available disk space, utilization </li></ul></ul><ul><ul><li>Volume movement is atomic; operation aborted on server crash </li></ul></ul>
    63. 63. User process –> open file F The kernel resolves that it’s a Vice file -> passes it to Venus D is in the cache & has callback – > use it without any network communication D is in cache but has no callback – > contact the appropriate server for a new copy; establish callback D is not in cache – > fetch it from the server ; establish callback File F is identified -> create a current cache copy Venus returns to the kernel which opens F and returns its handle to the process
    64. 64. <ul><li>AFS caches entire files from servers </li></ul><ul><ul><li>Client interacts with servers only during open and close </li></ul></ul><ul><li>OS on client intercepts calls, passes to Venus </li></ul><ul><ul><li>Venus is a client process that caches files from servers </li></ul></ul><ul><ul><li>Venus contacts Vice only on open and close </li></ul></ul><ul><ul><li>Reads and writes bypass Venus </li></ul></ul><ul><li>Works due to callback : </li></ul><ul><ul><li>Server updates state to record caching </li></ul></ul><ul><ul><li>Server notifies client before allowing another client to modify </li></ul></ul><ul><ul><li>Clients lose their callback when someone writes the file </li></ul></ul><ul><li>Venus caches dirs and symbolic links for path translation </li></ul>
    65. 65. <ul><li>The use of local copies when opening a session in Coda. </li></ul>
    66. 66. <ul><li>A descendent of AFS v2 (AFS v3 went another way with large chunk caching) </li></ul><ul><li>Goals </li></ul><ul><ul><li>More resilient to server and network failures </li></ul></ul><ul><ul><li>Constant Data Availability </li></ul></ul><ul><ul><li>Portable computing </li></ul></ul>
    67. 67. <ul><li>Keeps whole file caching, callbacks, end-to-end encryption </li></ul><ul><li>Adds full server replication </li></ul><ul><li>General Update Protocol </li></ul><ul><ul><li>Known as Coda Optimistic Protocol </li></ul></ul><ul><ul><li>COP1 (first phase) performs actual semantic operation to servers (using multicast if available) </li></ul></ul><ul><ul><li>COP2 sends a data structure called an update set which summarizes the client’s knowledge. These messages are piggybacked on later COP1’s </li></ul></ul>
    68. 68. <ul><li>Disconnected Operation (KEY) </li></ul><ul><ul><li>Hoarding </li></ul></ul><ul><ul><ul><li>Periodically reevaluates which objects merit retention in the cache (hoard walking) </li></ul></ul></ul><ul><ul><ul><li>Relies on both implicit and a lot of explicit info (profiles etc) </li></ul></ul></ul><ul><ul><li>Emulating i.e. maintaining a replay log </li></ul></ul><ul><ul><li>Reintegration – re-play replay log </li></ul></ul><ul><li>Conflict Resolution </li></ul><ul><ul><li>Gives repair tool </li></ul></ul><ul><ul><li>Log to give to user to manually fix issue </li></ul></ul>
    69. 69. <ul><li>The state-transition diagram of a Coda client with respect to a volume. </li></ul>
    70. 70. <ul><li>AFS deployments in academia and government (100’s) </li></ul><ul><li>Security model required Kerberos </li></ul><ul><ul><li>Many organizations not willing to make the costly switch </li></ul></ul><ul><li>AFS (but not coda) was not integrated into Unix FS. Separate “ls”, different – though similar – API </li></ul><ul><li>Session semantics not appropriate for many applications </li></ul>
    71. 71. <ul><li>Goals </li></ul><ul><ul><li>Efficient use of large main memories </li></ul></ul><ul><ul><li>Support for multiprocessor workstations </li></ul></ul><ul><ul><li>Efficient network communication </li></ul></ul><ul><ul><li>Diskless Operation </li></ul></ul><ul><ul><li>Exact Emulation of UNIX FS semantics </li></ul></ul><ul><li>Location transparent UNIX FS </li></ul>
    72. 72. <ul><li>Naming </li></ul><ul><ul><li>Local prefix table which maps path-name prefixes to servers </li></ul></ul><ul><ul><li>Cached locations </li></ul></ul><ul><ul><li>Otherwise there is location embedded in remote stubs in the tree hierarchy </li></ul></ul><ul><li>Caching </li></ul><ul><ul><li>Needs sequential consistency </li></ul></ul><ul><ul><li>If one client wants to write, disables caching on all open clients. Assumes this isn’t very bad since this doesn’t happen often </li></ul></ul><ul><li>No security between kernels. All over trusted network </li></ul>
    73. 73. <ul><li>The best way to implement something depends very highly on the goals you want to achieve </li></ul><ul><li>Always start with goals before deciding on consistency semantics </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×