Andrew File System

Scale and
Performance in
a Distributed
File System
Prof. Giuseppe Cattaneo Giorgio Vitiello

Overview
◎Distributed File System
◎Andrew File System
◎Network File System
◎AFS vs NFS
◎Conclusions
2

Distributed File System
◎File System (FS) controls how data is stored
and retrieved
◎A Distributed File System (DFS) is a set of
client and server services that allow access
multiple hosts sharing information via
network
3

Andrew File System
◎Andrew is a distributed computing
environment that has been under development
at Carnegie Mellon University since 1983
◎Each individual at CMU may eventually possess
an Andrew workstation
◎A fundamental component of Andrew is the
distributed file system
4

Andrew File System
◎It’s the first wide DFS for large number of
users
◎Andrew File System presents a homogeneous,
location-transparent file name space to all
the client workstations
5

Andrew File System
◎The goal of Andrew File System is scalability
◎Large scale affects a distributed system in two
ways
○ it degrades performance
○ it complicates administration and day-to-day
operation
◎Andrew copes successfully with these
concerns
6

1° Prototype
◎Validate the basic file system architecture
○ obtain feedback on our design as rapidly as possible
○ build a system that was usable enough to make that
feedback meaningful
◎Prototype used by 400 users
◎100 workstations Sun2 with 65MB local disks
◎6 servers Sun2s or Vax-750s each with 2-3 disk
of 400MB
◎Client-Server nodes run 4.2BSD of UNIX
7

1° Prototype
◎VICE: a set of trusted servers, it defines a
homogeneous, location-transparent file name
space to all the client
◎VENUS: client process, it caches files from
Vice and stores modified copies of files back
on the servers they came from
8

● Venus would rendezvous with a process listening at a
well-known network address on a server
● This process then created a dedicated process to deal
with all future requests from the client
● All communication and manipulation of data structures
between server processes took place via files in the
underlying file system
10

● Each server contained a directory hierarchy mirroring
the structure of the Vice files stored on it
● The directory hierarchy contained Stub directories that
represented portions of the Vice name space located on
other servers. If a file were not on a server, the search
for its name would end in a stub directory which
identified the server containing that file
11

The Benchmark
◎This benchmark consists of a command script
that operates on a collection of files
constituting an application program
◎Load Unit: load placed on a server by a single
client workstation running this benchmark
◎A load unit corresponds to about five Andrew
users
◎The input to the benchmark is a read-only
source subtree consisting of about 70 files
12

Phases of the benchmark
◎ MakeDir: constructs a target subtree that is identical in
structure to the source subtree
◎ Copy: copies every file from the source subtree to the
target subtree
◎ ScanDir: recursively traverses the target subtree and
examines the status of every file in it. It doesn’t actually
read the contents of any file
◎ ReadAll: scans every byte of every file in the target
subtree once
◎ Make: compiles and links all the files in the target
subtree
13

Performance Observations
◎Venus used two caches
○ one for files
○ the other for status information about files
◎The most frequent system calls
○ TestAuth, validated cache entries
○ GetFileStat, obtained status information about files
absent from the cache
15

● 12 machines
○ 81% file-cache hit ratio
○ 82% status-cache hit ratio
○ 6% Fetch and Stores Calls
16

● The total running time for the benchmark as a function
of server load
● The table also shows the average response time for the
most frequent Vice operation, TestAuth
17

● A software on servers to maintain statistics about CPU
and disk utilization and about data transfers to and
from the disks
● Table IV presents these data for four servers over a 2-
week period
● As the CPU utilizations in the table show, the servers
loads were not evenly balanced
● CPU utilization of about 40 percent for two most used
servers.
18

◎The performance bottleneck in our prototype
was the server CPU (high CPU utilization)
○ the frequency of context switches between the many
server processes
○ the time spent by the servers in traversing full
pathnames presented by workstations
19

◎Significant performance improvement is
possible if
○ we reduce the frequency of cache validity checks
○ reduce the number of server processes
○ require workstations rather than the servers to do
pathname traversals
○ balance server usage by reassigning users
20

Changes for performance
◎Unchanged aspects
○ workstations cache entire files from a collection of
dedicated autonomous servers
○ venus and server code run as user-level processes
○ communication between servers and clients is based
on the RPC paradigm
○ the mechanism in the workstation kernels to
intercept and forward file requests to Venus is the
same as in the prototype
21

Changes for performance
◎Changed aspects
○ To enhance performance
● Cache management
● Name resolution
● Communication and server process structure
● Low-level storage representation
○ To improve the operability of the system
22

Cache management
◎Caching, the key to Andrew’s ability to scale
well
◎Venus now caches the contents of directories
and symbolic links in addition to files
◎There are still two separate caches:
○ for status
○ for data
◎Venus uses LRU to keep each of them
bounded in size
23

Cache management
◎ Modifications to a cached file are done locally and are
reflected back to Vice when the file is closed
◎ Venus intercepts only the opening and closing of files
and doesn’t participate in the reading or writing of
individual bytes on a cached copy
◎ For the integrity, modifications to a directory are made
directly on the server responsible for that directory
◎ Venus reflects the change in its cached copy to avoid
refetching the directory
24

Cache management
◎Callback mechanism to keep consistent cache
entries
○ Venus assumes that cache entries are valid unless
otherwise notified
○ When a workstation caches a file, the server promises
to notify it before allowing a modification by any
other workstation
◎Callback reduces the number of cache
validation requests received by servers
25

Cache management
◎Merits
○ reducing cache validation traffic
○ callback reduces the load on servers considerably
○ callback makes it feasible to resolve pathnames on
workstations
◎Defects
○ callback complicates the system because each server
and Venus now maintain callback state information
○ there is a potential for inconsistency if the callback
state maintained by a Venus gets out of sync with the
corresponding state maintained by the servers
26

Name Resolution
◎4.2BSD system
○ inode: a file has a unique, fixed-length name
○ one or more variable-length pathnames that map to
this inode
◎The routine that performs this mapping,
namei, is usually one of the most heavily used
◎CPU overhead on the servers and was an
obstacle to scaling
27

Name Resolution
◎The notion of two-level names
◎Each Vice file or directory is now identified by a
unique fixed-length Fid (96 bit)
◎Each entry in a directory maps a component of a
pathname to a fid
◎Venus performs the logical equivalent of a namei
operation, and maps Vice pathnames to fids
◎Servers are presented with fids and are unaware
of pathnames
28

Communication and Server Process Structure
◎The use of a server process per client didn’t scale
well
◎Server processes couldn’t cache critical shared
information in their address spaces
◎4.2BSD doesn’t permit processes to share virtual
memory
◎The redesign solves these problems by using a
single process to service all clients of a server
29

Communication and Server Process Structure
◎An user-level mechanism to support multiple
nonpreemptive Lightweight Processes (LWPs)
within one process
◎Context switching between LWPs is only on the
order of a few procedure-call times
◎Number of LWPs is typically five
◎RPC mechanism is implemented entirely outside
the kernel and is capable of supporting many
hundreds or thousands of clients per server
30

Low-Level Storage Representation
◎ Limited cost of the namei operations involved in
accessing data via pathnames
◎ Access files by their inodes
◎ The internal inode interface is not visible to user-level
processes
◎ Added an appropriate set of system calls
◎ The vnode information for a Vice file identifies the
inode of the file storing its data
◎ Data access consists of an index of a fid into a table to
look up vnode information
◎ An iopen call to read or write the data
31

Overall Design
◎Suppose a user process opens a file with
pathname P on a workstation
◎The kernel detects that it is a Vice file and
passes it to Venus on that workstation
◎Venus now uses the cache to examine each
directory component D of P in succession
32

Overall Design
1° Case
If D is in the cache
and has a callback
on it, it’s used
without any network
communication
2° Case
If D is in the cache
but has no callback
on it, the
appropriate server is
contacted, a new
copy of D is fetched
if it has been
updated, and a
callback is
established on it
3° Case
If D isn’t in the
cache, it is fetched
from the appropriate
server, and a
callback is
established on it
33

Overall Design
File F is
identified
34
Cache copy
is created
Venus
returns to
the kernel
Kernel opens
the cached
copy of F and
returns its
handle to the
user process
Directories
and F are in
the cache
with
callbacks
Venus regains
control when F
is closed.If it
has been
modified
locally,updates
it on the server

Overall Design
◎Future references to this file will involve no
network communication at all, unless a
callback is broken on a component of P
◎An LRU replacement algorithm is periodically
run to reclaim cache space
◎Problem: since the other LWPs in Venus may
be concurrently servicing file access requests
from other processes, accesses to cache data
structures must be synchronized
35

Overall Design
◎Design converged on the following consistency
semantics
○ Writes to an open file by a process on a workstation are
visible to all other processes on the workstation
immediately but are invisible elsewhere in the network
○ Once a file is closed, the changes made to it are visible to
new opens anywhere on the network
○ All other file operations are visible everywhere on the
network immediately after the operation completes
○ Application programs have to cooperate in performing the
necessary synchronization if they care about the
serialization of these operations
36

2° Prototype
◎Two questions
○ It has the anticipated improvement in scalability been
realized?
○ What are the characteristics of the system in normal
operation?
◎Client workstations
○ IBM-RT
○ Storage 40 or 70MB
○ First Models with 4MB RAM
37

● Benchmark
○ Andrew workstation is 19%
slower than a stand-alone
○ The prototype was 70% slower
○ Copy and Make phases are
most susceptible to server load
38

● CPU and disk utilization on the server during the benchmark
○ CPU - from 8.1% to 70.9%
○ Disk - from 2.7% to 23.6%
● CPU still limits performance in our system
● Our design changes have improved scalability considerably
○ At a load of 20, the system is still not saturated (50 Andrew users)
39

Comparison with a remote-open file system
◎Considerations about caching of entire files
on local disks in the AFS
○ Locality: server load and network traffic are reduced
○ Whole-file transfer approach contacts servers only on
opens and closes
○ Disk caches retain their entries across reboots
○ Caching of entire files simplifies cache management
41

◎Workstations require local disks for
acceptable performance
◎Files that are larger than the local disk cache
cannot be accessed at all
◎Concurrent read and write semantics across
workstations is impossible
42

◎Could an alternative design have produced
equivalent or better results?
◎How critical to scaling are caching and whole-
file transfer?
◎AFS vs NFS
43

Network File System
◎NFS developed by Sun Microsystems in 1984
◎builds on the Open Network Computing
Remote Procedure Call (ONC RPC) system
◎Client-Server Protocol
◎Sharing file, directory or FS
◎SO Unix
44

Network File System
◎NFSv1 (1984)
○ develop in-house experimental purpose
◎NFSv2 (March 1989)
○ Release for commercial use, UDP protocol, stateless
◎NFSv3 (June 1995)
○ Overcome to v2, UDP/TCP protocol, stateless
◎NFSv4 (April 2003)
○ Stateful Server, TCP protocol
○ Focus on Performance, Accessibility, Scalability,
Security
45

AFS vs NFS
◎Scalability assessment, to compare with NFS
○ NFS is a mature product, not a prototype
○ Sun has spent a lot in NFS
○ Different architecture, same hardware
◎Benchmark
○ 18 workstations Sun3
○ A server Sun3
○ Client-Server on a 10MB Ethernet
○ Experiments on Andrew consists in a Cold Cache and
Warm Cache
47

48
● NFS performs slightly
better than Andrew at
low loads, not with huge
loads
● ScanDir, ReadAll, Make
show different
performance
● Andrew wins on NFS in
ReadAll

49
● Benchmark in function of load unit
● CPU utilization in function of load unit
○ Load 1: 22% NFS - 3% AFS
○ Load 18: 100% NFS - 38% AFS in Cold
Cache and 42% in Warm Cache

Server Side
● Load 1
○ NFS, Disk 1: 9%
○ NFS, Disk 2: 3%
○ AFS, Disk 1: 4%
● Load 18
○ NFS, Disk 1: 95%
○ NFS, Disk 2: 19%
○ AFS, Disk 1: 33%
50

Conclusions
◎ AFS is more scalable than NFS
◎ AFS would reduce its costs if moved in kernel side
◎ Andrew is implemented in user space
◎ NFS in kernel space
51

Conclusions
◎ AFS testing
○ 400 workstations, 16 server, 3500 Andrew users
◎ Goals
○ Scaling until 5000 workstation
○ Moving Venus and server code in kernel
◎ AFS is a great FS
◎ Excellent Performance
◎ It compares favorably with the most prominent
alternative distributed file system
52

Andrew File System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Andrew File System

Similar to Andrew File System (20)

More from Università Degli Studi Di Salerno

More from Università Degli Studi Di Salerno (9)

Recently uploaded

Recently uploaded (20)

Andrew File System

Editor's Notes