1. A Fast File System for UNIX
Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry
Slides by Aleatha Parker-Wood
Tuesday, April 6, 2010
2. State of the Art
• Bell Labs UNIX file system for the PDP-11 (referred to as “old
filesystem” or OldFS)
• Disks are divided into physical partitions which contain a file system
• Linked list of free blocks stored in superblock
• inodes point either directly to blocks or to indirect blocks
Tuesday, April 6, 2010
3. Inode Layout in OldFS
inodes data
• All inodes are stored at the beginning of the disk region for the filesystem
• Incurs long seek times for every access
• inodes for files are unlikely to be adjacent to their containing directory’s
inodes or to each other
• More seek time incurred
Tuesday, April 6, 2010
4. Data Layout in OldFS
• Completely agnostic to physical storage device
• Consecutive file blocks unlikely to be on the same cylinder
• Even more seeking
• 512 byte blocks (increased to 1024 bytes)
• Increasing the block size improved performance by a factor of 2
• Ergo: room for improvement!
Tuesday, April 6, 2010
5. Performance for OldFS
• Old system using 4% of disk bandwidth
• Performance good initially (175kbps), but degraded over time
(30kbps)
• Free list became increasingly disorganized as file system was used...
• Blocks allocated in increasingly random locations
Tuesday, April 6, 2010
6. The Fast File System (FFS)
• Disk partitions divided into “cylinder groups”
• 4K minimum block size
• ensures few levels of indirection (2 for files < than 4 GB)
• Blocks are broken into fragments to accommodate small files
Tuesday, April 6, 2010
7. Cylinder Groups
• Bookkeeping info stored for each cylinder group
• Backup copy of superblock
• Space for inodes
• A bit map of free blocks/fragments
• A static number of inodes allocated at creation time
• Bookkeeping info stored at a varying offset for each group (so losing
the top platter will not result in complete data loss)
Tuesday, April 6, 2010
8. Fragments
• 2,4, or 8 per block (minimum size is a disk sector, 512 bytes)
• Files never use more than one fragmented block
• Writing to a file which occupies a fragmented block either fills the
current block (if room is available) or allocates a new block.
• Expanding files a fragment at a time causes frequent copying, writing
in full blocks is optimal.
Tuesday, April 6, 2010
9. Layout Optimizations
• Optimize for the processor and mass storage device (usually disk)
• Cylinder aware
• Chooses rotationally optimal blocks (either consecutive or delayed)
• Stores rotational layout tables to find positions with data already
written nearby
• Trade off between localizing data references and spreading unrelated
data across cylinder groups.
Tuesday, April 6, 2010
10. Layout Policies: Inodes
• Inodes of files in a directory often accessed together
• For instance, ls reads every inode in the directory
• Keep inodes in same cylinder group
• When creating new directories, choose cylinder group with few
current inodes and directories
Tuesday, April 6, 2010
11. Layout Policies: Data Blocks
• Place all data blocks for a file within the same cylinder group
• Preferably at rotationally optimal placements
• If file is greater than 48K (i.e., an indirect block is needed), move to
new cylinder group (you had to seek anyway...)
• Likewise for every MB thereafter
Tuesday, April 6, 2010
12. So when you say “Fast” File
System....
Tuesday, April 6, 2010
13. Read Throughput
Processor/ Speed Max read
Type
Bus (Kbps) bandwidth % %CPU
750/
Old 1024
UNIBUS 29 983 3 11
750/
New 4096/1024
UNIBUS 221 983 22 43
750/
New 8192/1024
UNIBUS 233 983 24 29
750/
New 4096/1024
MASSBUS 466 983 47 73
750/
New 8192/1024
MASSBUS 466 983 47 54
Tuesday, April 6, 2010
14. Write Throughput
Processor/ Speed Max write
Type
Bus (Kbps) bandwidth % %CPU
750/
Old 1024
UNIBUS 48 983 5 29
750/
New 4096/1024
UNIBUS 142 983 14 43
750/
New 8192/1024
UNIBUS 215 983 22 46
750/
New 4096/1024
MASSBUS 323 983 33 94
750/
New 8192/1024
MASSBUS 466 983 47 95
Tuesday, April 6, 2010
15. Other metrics...
• When running ls for large directories containing other directories,
disk accesses for inodes cut in two
• Large directories containing only files cut by up to a factor of eight
• Transfer rates stable over time
• Throughput varies with amount of free space maintained (reduced by
half when system is full)
Tuesday, April 6, 2010
16. Other Enhancements
• Arbitrary length file names (ok, 512 bytes)
• Advisory file locking
• Shared or exclusive
• Applied or removed only on open files
• Symbolic links, a la Multics
• Atomic rename operation
• Quotas
Tuesday, April 6, 2010
17. Conclusions
• Taking advantage of disk geometry and access patterns resulted in 10-
fold improvement in both read and write throughput
• Improvements in block layout increased locality while reducing
wasted space
• Hardware matters!
Tuesday, April 6, 2010