A Fast File System for Unix
   Marshall K. Mckusick, William N. Joy,
   Samual J. Leffler and Robert S. Fabry

   Computer Systems Research Group, UCB



                                            Presented By:
CS 5204: Operating Systems, Virginia Tech   Parang Saraf
About the Paper
• Considered as one of the most fundamental papers in
  operating systems

• Have been cited around 930 times

• Describes a new file system




                                                        2
Traditional File System
• File System developed at Bell Laboratories
• A file system is described by its Super-Block
  o Number of Data Blocks
  o Count of maximum number of files
  o Pointer to free list (linked list to all free blocks)

• Disk drive is divided into partitions
  o Each disk partition may contain one file system
  o A file system never spans multiple partitions

                                                            3
Traditional File System




                          4
Traditional File System – Inode
• Each file has a descriptor associated with it – Inode.

• Information includes:
  o Ownership of the file
  o Time stamps marking last modification and access time
  o Array of indices pointing to the data blocks
      Direct Blocks – 8
      Indirect Blocks – Singly, Doubly and Triply




                                                            5
Traditional File System – Inode




                                  6
Traditional File System – Inode




                                  7
Traditional File System – Problem
• Inode information segregated from Data
   o Long seek time from inode to its data


• Files in single directory are not typically allocated consecutive
  slots for inode information
   o Many non-consecutive blocks of inodes are accessed when executing
     operations on inodes of several files in a directory

• Sub-optimum allocation of data blocks
   o Small Block size – 512 bytes
   o Many Seeks – Next sequential block is not on the same cylinder
   o Limited read-ahead
                                                                         8
Old File System
• Developed at Berkeley

• Increased Throughput
  •   Changing the basic block size from 512 bytes to 1024 bytes
  •   Each disk transfer accessed twice as much data
  •   Less number to indirect blocks used


• Increased Reliability
  •   Staging modifications to critical file system information so that they could
      either be completed or repaired cleanly after a crash


                                                                                     9
Old File System – Problem
• Old file system was still using just 4% of disk bandwidth

• Main problem – Scrambled Free List




                                                              10
Old File System – Problem
• Old file system was still using just 4% of disk bandwidth

• Main problem – Scrambled Free List
   o Initially ordered for optimal access
   o Scrambled because files were created and removed
   o Eventually becomes entirely random – blocks allocated randomly
   o On creation provides transfer rates up to 175 kbps
   o Rate deteriorates to 30 kbps after a few weeks of moderate use


• Possible Solution – Dump, rebuild and restore / Fragmentation


                                                                      11
New File System
• Each disk drive contains one or more file systems

• A File System is described by its super-block, located at
  the beginning of the disk partition

• Super-block is replicated to protect against catastrophic
  loss

• Block size is any power of two >= 4096 bytes
  o Decided at the time of file system creation and can’t be changed
  o File Systems can have different block sizes
                                                                       12
New File System – Cylinder Groups
• Comprises of one or more consecutive cylinders




                                                   13
New File System – Cylinder Groups
• Comprises of one or more consecutive cylinders

• Disk partition is divided into one or more cylinder groups

• Has associated book-keeping information:
  o A redundant copy of super-block
  o Space for inodes
  o A bit map describing available blocks – replaces free list
  o Summary information describing usage of data blocks


                                                                 14
New File System – Cylinder Groups
• Contains static number of inodes:
   o Allocated at file system creation time
   o Default policy – one inode for each 2048 bytes


• Book-keeping information begins at varying offset from the
  beginning of the cylinder group
   o Redundant information spirals down into the cylinder
   o Any single track, cylinder or platter can be lost without losing copies of the
     super-block




                                                                                      15
New File System – Structure




                              16
New File System – Key Contributions
• Optimizing storage utilization

• File System Parameterization

• Layout Policies




                                      17
Optimizing Storage Utilization
• New 4096 size blocks – transfers 4 times more

• Problem with large blocks:
  o Wasted space due to small files




                                                  18
Optimizing Storage Utilization
• Solution:
  o Divide the 4096 block into 2, 4 or 8 fragments to accommodate small files
  o Fragment size is specified at the time file system is created
  o Block map records the space available at fragment level




                                                                                19
Optimizing Storage Utilization
• Free List vs Bitmap




                                 20
Optimizing Storage Utilization
• Space allocation:
  o Space is allocated when a program does a write system call
  o Three possible conditions:
      Enough space left in an already allocated block or fragment
      File contains no fragmented blocks – allocate new blocks and fragments
      File contains one or more fragmented blocks but has insufficient space
       to hold new data – new block is allocated, old fragments are copied and
       new fragments are appended




                                                                                 21
Optimizing Storage Utilization
• Free space reserve
  o Minimum acceptable percentage of file system blocks that should
    be free – 90%
  o Only system administrator can allocate blocks after that
  o Important for the layout policies to be effective
  o After this the file system throughput is cut in half because of the
    inability to localize blocks in a file




                                                                          22
Optimizing Storage Utilization
• Wasted space comparison
  o Space wasted by 4096/1024 byte new file system is same as 1024
    byte Old File System
  o New file system uses less space for indexing large files
  o Uses same amount of space for small files
  o Free space reserve should also be counted as wasted space




                                                                     23
File System Parameterization
• Optimum block allocation based on hardware parameters
  o Speed of Processor
  o Hardware support for mass storage transfers
  o Characteristics of the mass storage devices


• Blocks are allocated on the same cylinder

• Block allocation depends on whether the processor has
  an input/output channel or not


                                                          24
File System Parameterization
 Accessing which data is faster?




                                   25
File System Parameterization
 Accessing which data is faster?




                                                     26
  Depends whether processor has I/O channel or not
File System Parameterization
• Rotationally Optimal Blocks
    o Processors without I/O channels must field an interrupt and then prepare for a
      new disk transfer
    o Disk rotates during this time
    o Place blocks such that disk rotation is taken into account before the start of a
      new disk transfer operation

• Cylinder group summary information includes count of
  blocks based on different rotational positions – 8 positions
•   Super-block contains a vector of lists called as
    Rotational Layout Tables – Used by system when
    allocating new blocks
                                                                                         27
File System Parameterization




                               28
Layout Policies
• Layout policies divided into two distinct parts:
  o Global Policies
  o Local Allocation Routines


• Two allocable resources:
  o   Inodes
  o   Data Blocks




                                                     29
Layout Policies
• Global Policies
  o Uses file system wide summary information to make decisions
    regarding the placement of new inodes and data blocks
  o Tries to localize data that is concurrently accessed while spreads
    out unrelated data
  o Inodes:
      Places all inodes of files in a directory in the same cylinder group
      A new directory is placed in a cylinder group that has a greater than
       average number of free inodes and the smallest number of directories
       already in it – ensures that files are distributed throughout the disk



                                                                                30
Layout Policies
• Global Policies
  o Data Blocks:
      Tries to place all data blocks for a file in the same cylinder group
      None of the cylinder groups should ever become completely full
      Heuristic Solution – redirect block allocation to a different cylinder group
       when a file exceeds 48 kb and at every MB thereafter
      Ensures that cost of one long seek per MB is small
      New cylinder groups are chosen from those cylinder groups that have a
       greater than average number of free blocks left
      Finally it calls Local Allocation Routines for block allocation



                                                                                      31
Layout Policies
• Local Allocation Routines
  o Allocates a free block as requested by the Global layout policies
  o Uses a four level allocation
  o First Level – use the next free block that is rotationally closest to the requested
    block on the same cylinder




                                         Cylinder 0




                                                                                          32
Layout Policies
• Local Allocation Routines
  o Second Level – if there are no free blocks on the same cylinder, a free block
    in the same cylinder group is selected




                                     Cylinder 0

                                                               Cylinder Group

                                    Cylinder 1




                                                                                    33
Layout Policies
• Local Allocation Routines
  o Third Level – if the cylinder group is full, use the quadratic hash function to
    hash the cylinder group number to find another cylinder group to look for a
    free block
  o Fourth Level – if the hash fails, use an exhaustive search on all cylinder
    groups

o Quadratic Hash
  o is used because of its speed in finding unused slots in nearly full hash tables
  o File systems parameterized to maintain 10% free space rarely use this




                                                                                      34
Performance
• Measured Throughput




                        35
Performance
• List Directory command performance
  o For large directories containing many directories, disk access for inodes is cut
    by a factor of two
  o For large directories containing only files, disk access for inodes is cut by a
    factor of eight

• Both reads and writes are faster in new file system
  o   Because larger block sizes are used
  o   The overhead of allocating is more but cost per byte allocation is same
  o   Reading rate is always at least as fast as writing rate
         Writes are slower for 4096 byte block as compared to 8096 byte block
         In old file system writing was 50% faster than reading

                                                                                       36
New File System - Limitations
• Limited by memory to memory copy operations required
  to move data from disk buffers in the system’s address
  space to data buffers in the user’s address space
  o Buffer alignment of both address space


• One block is allocated to a file at a time
  o Pre-allocate several blocks at once and releasing unused ones on file closing




                                                                                    37
Functional Enhancements
• Long File Name

• File Locking

• Symbolic Links

• Rename

• Quotas


                          38
Long File Name
• Maximum length of file name is 255 characters

• Directories are allocated 512 byte units called chunks

• Chunks are broken into Directory Entries:
  o Contains information necessary to map the name of file with inode
  o First three fields are fixed length – inode number, size of entry and length of
    file name




                                                                                      39
File Locking
• Hard Lock – always enforced when a program tries to
  access a file

• Advisory shared or exclusive locks – requested by the
  programs

• System administrator privilege can override locks

• No deadlock detection is attempted



                                                          40
Symbolic Links
• A symbolic link is implemented as a file that contains a
  pathname

• Pathname can be relative or absolute

• On encountering a symbolic link while interpreting a
  component of a pathname, the contents of the symbolic
  link is prepended to the rest of the pathname




                                                             41
Rename
• Old file system required three system calls for renaming

• Target file could be left with temporary name due to crash

• New rename system call added that guarantees the
  existence of the target name

• Renaming works both on directory and files




                                                               42
Quotas
• Old file system – any single user can allocate all the
  available space in the file system

• Quota restricts the amount of file system resources that a
  user can obtain

• Sets limits to both inodes and number of disk blocks

• Hard and soft limits



                                                               43
Key Take-Away points
• Substantially higher throughput rates – large block size

• Flexible allocation policies
  o Better locality of reference
  o Less wastage


• Adapted to wide range of peripheral and processor
  characteristics



                                                             44
References
• Presentation on “A Fast File System” by:
   o Zhifei Wang : www.cs.pdx.edu/~walpole/class/cs533/spring2006/slides/191.ppt
   o pdc-amd01.poly.edu/~wein/cs6243/ppts/fastfile.ppt
   o Sean Mondesire and Subramanian Kasi :
      www.cs.ucf.edu/courses/cop5611/spring05/item/FFS.ppt
   o www.scs.ryerson.ca/~aabhari/File_System.ppt

• http://flylib.com/books/en/3.224.1.79/1/
• http://osr507doc.sco.com/en/HANDBOOK/graphics/harddisk.gif


                                                                                   45

A fast file system for unix presentation by parang saraf (cs5204 VT)

  • 1.
    A Fast FileSystem for Unix Marshall K. Mckusick, William N. Joy, Samual J. Leffler and Robert S. Fabry Computer Systems Research Group, UCB Presented By: CS 5204: Operating Systems, Virginia Tech Parang Saraf
  • 2.
    About the Paper •Considered as one of the most fundamental papers in operating systems • Have been cited around 930 times • Describes a new file system 2
  • 3.
    Traditional File System •File System developed at Bell Laboratories • A file system is described by its Super-Block o Number of Data Blocks o Count of maximum number of files o Pointer to free list (linked list to all free blocks) • Disk drive is divided into partitions o Each disk partition may contain one file system o A file system never spans multiple partitions 3
  • 4.
  • 5.
    Traditional File System– Inode • Each file has a descriptor associated with it – Inode. • Information includes: o Ownership of the file o Time stamps marking last modification and access time o Array of indices pointing to the data blocks  Direct Blocks – 8  Indirect Blocks – Singly, Doubly and Triply 5
  • 6.
  • 7.
  • 8.
    Traditional File System– Problem • Inode information segregated from Data o Long seek time from inode to its data • Files in single directory are not typically allocated consecutive slots for inode information o Many non-consecutive blocks of inodes are accessed when executing operations on inodes of several files in a directory • Sub-optimum allocation of data blocks o Small Block size – 512 bytes o Many Seeks – Next sequential block is not on the same cylinder o Limited read-ahead 8
  • 9.
    Old File System •Developed at Berkeley • Increased Throughput • Changing the basic block size from 512 bytes to 1024 bytes • Each disk transfer accessed twice as much data • Less number to indirect blocks used • Increased Reliability • Staging modifications to critical file system information so that they could either be completed or repaired cleanly after a crash 9
  • 10.
    Old File System– Problem • Old file system was still using just 4% of disk bandwidth • Main problem – Scrambled Free List 10
  • 11.
    Old File System– Problem • Old file system was still using just 4% of disk bandwidth • Main problem – Scrambled Free List o Initially ordered for optimal access o Scrambled because files were created and removed o Eventually becomes entirely random – blocks allocated randomly o On creation provides transfer rates up to 175 kbps o Rate deteriorates to 30 kbps after a few weeks of moderate use • Possible Solution – Dump, rebuild and restore / Fragmentation 11
  • 12.
    New File System •Each disk drive contains one or more file systems • A File System is described by its super-block, located at the beginning of the disk partition • Super-block is replicated to protect against catastrophic loss • Block size is any power of two >= 4096 bytes o Decided at the time of file system creation and can’t be changed o File Systems can have different block sizes 12
  • 13.
    New File System– Cylinder Groups • Comprises of one or more consecutive cylinders 13
  • 14.
    New File System– Cylinder Groups • Comprises of one or more consecutive cylinders • Disk partition is divided into one or more cylinder groups • Has associated book-keeping information: o A redundant copy of super-block o Space for inodes o A bit map describing available blocks – replaces free list o Summary information describing usage of data blocks 14
  • 15.
    New File System– Cylinder Groups • Contains static number of inodes: o Allocated at file system creation time o Default policy – one inode for each 2048 bytes • Book-keeping information begins at varying offset from the beginning of the cylinder group o Redundant information spirals down into the cylinder o Any single track, cylinder or platter can be lost without losing copies of the super-block 15
  • 16.
    New File System– Structure 16
  • 17.
    New File System– Key Contributions • Optimizing storage utilization • File System Parameterization • Layout Policies 17
  • 18.
    Optimizing Storage Utilization •New 4096 size blocks – transfers 4 times more • Problem with large blocks: o Wasted space due to small files 18
  • 19.
    Optimizing Storage Utilization •Solution: o Divide the 4096 block into 2, 4 or 8 fragments to accommodate small files o Fragment size is specified at the time file system is created o Block map records the space available at fragment level 19
  • 20.
    Optimizing Storage Utilization •Free List vs Bitmap 20
  • 21.
    Optimizing Storage Utilization •Space allocation: o Space is allocated when a program does a write system call o Three possible conditions:  Enough space left in an already allocated block or fragment  File contains no fragmented blocks – allocate new blocks and fragments  File contains one or more fragmented blocks but has insufficient space to hold new data – new block is allocated, old fragments are copied and new fragments are appended 21
  • 22.
    Optimizing Storage Utilization •Free space reserve o Minimum acceptable percentage of file system blocks that should be free – 90% o Only system administrator can allocate blocks after that o Important for the layout policies to be effective o After this the file system throughput is cut in half because of the inability to localize blocks in a file 22
  • 23.
    Optimizing Storage Utilization •Wasted space comparison o Space wasted by 4096/1024 byte new file system is same as 1024 byte Old File System o New file system uses less space for indexing large files o Uses same amount of space for small files o Free space reserve should also be counted as wasted space 23
  • 24.
    File System Parameterization •Optimum block allocation based on hardware parameters o Speed of Processor o Hardware support for mass storage transfers o Characteristics of the mass storage devices • Blocks are allocated on the same cylinder • Block allocation depends on whether the processor has an input/output channel or not 24
  • 25.
    File System Parameterization Accessing which data is faster? 25
  • 26.
    File System Parameterization Accessing which data is faster? 26 Depends whether processor has I/O channel or not
  • 27.
    File System Parameterization •Rotationally Optimal Blocks o Processors without I/O channels must field an interrupt and then prepare for a new disk transfer o Disk rotates during this time o Place blocks such that disk rotation is taken into account before the start of a new disk transfer operation • Cylinder group summary information includes count of blocks based on different rotational positions – 8 positions • Super-block contains a vector of lists called as Rotational Layout Tables – Used by system when allocating new blocks 27
  • 28.
  • 29.
    Layout Policies • Layoutpolicies divided into two distinct parts: o Global Policies o Local Allocation Routines • Two allocable resources: o Inodes o Data Blocks 29
  • 30.
    Layout Policies • GlobalPolicies o Uses file system wide summary information to make decisions regarding the placement of new inodes and data blocks o Tries to localize data that is concurrently accessed while spreads out unrelated data o Inodes:  Places all inodes of files in a directory in the same cylinder group  A new directory is placed in a cylinder group that has a greater than average number of free inodes and the smallest number of directories already in it – ensures that files are distributed throughout the disk 30
  • 31.
    Layout Policies • GlobalPolicies o Data Blocks:  Tries to place all data blocks for a file in the same cylinder group  None of the cylinder groups should ever become completely full  Heuristic Solution – redirect block allocation to a different cylinder group when a file exceeds 48 kb and at every MB thereafter  Ensures that cost of one long seek per MB is small  New cylinder groups are chosen from those cylinder groups that have a greater than average number of free blocks left  Finally it calls Local Allocation Routines for block allocation 31
  • 32.
    Layout Policies • LocalAllocation Routines o Allocates a free block as requested by the Global layout policies o Uses a four level allocation o First Level – use the next free block that is rotationally closest to the requested block on the same cylinder Cylinder 0 32
  • 33.
    Layout Policies • LocalAllocation Routines o Second Level – if there are no free blocks on the same cylinder, a free block in the same cylinder group is selected Cylinder 0 Cylinder Group Cylinder 1 33
  • 34.
    Layout Policies • LocalAllocation Routines o Third Level – if the cylinder group is full, use the quadratic hash function to hash the cylinder group number to find another cylinder group to look for a free block o Fourth Level – if the hash fails, use an exhaustive search on all cylinder groups o Quadratic Hash o is used because of its speed in finding unused slots in nearly full hash tables o File systems parameterized to maintain 10% free space rarely use this 34
  • 35.
  • 36.
    Performance • List Directorycommand performance o For large directories containing many directories, disk access for inodes is cut by a factor of two o For large directories containing only files, disk access for inodes is cut by a factor of eight • Both reads and writes are faster in new file system o Because larger block sizes are used o The overhead of allocating is more but cost per byte allocation is same o Reading rate is always at least as fast as writing rate  Writes are slower for 4096 byte block as compared to 8096 byte block  In old file system writing was 50% faster than reading 36
  • 37.
    New File System- Limitations • Limited by memory to memory copy operations required to move data from disk buffers in the system’s address space to data buffers in the user’s address space o Buffer alignment of both address space • One block is allocated to a file at a time o Pre-allocate several blocks at once and releasing unused ones on file closing 37
  • 38.
    Functional Enhancements • LongFile Name • File Locking • Symbolic Links • Rename • Quotas 38
  • 39.
    Long File Name •Maximum length of file name is 255 characters • Directories are allocated 512 byte units called chunks • Chunks are broken into Directory Entries: o Contains information necessary to map the name of file with inode o First three fields are fixed length – inode number, size of entry and length of file name 39
  • 40.
    File Locking • HardLock – always enforced when a program tries to access a file • Advisory shared or exclusive locks – requested by the programs • System administrator privilege can override locks • No deadlock detection is attempted 40
  • 41.
    Symbolic Links • Asymbolic link is implemented as a file that contains a pathname • Pathname can be relative or absolute • On encountering a symbolic link while interpreting a component of a pathname, the contents of the symbolic link is prepended to the rest of the pathname 41
  • 42.
    Rename • Old filesystem required three system calls for renaming • Target file could be left with temporary name due to crash • New rename system call added that guarantees the existence of the target name • Renaming works both on directory and files 42
  • 43.
    Quotas • Old filesystem – any single user can allocate all the available space in the file system • Quota restricts the amount of file system resources that a user can obtain • Sets limits to both inodes and number of disk blocks • Hard and soft limits 43
  • 44.
    Key Take-Away points •Substantially higher throughput rates – large block size • Flexible allocation policies o Better locality of reference o Less wastage • Adapted to wide range of peripheral and processor characteristics 44
  • 45.
    References • Presentation on“A Fast File System” by: o Zhifei Wang : www.cs.pdx.edu/~walpole/class/cs533/spring2006/slides/191.ppt o pdc-amd01.poly.edu/~wein/cs6243/ppts/fastfile.ppt o Sean Mondesire and Subramanian Kasi : www.cs.ucf.edu/courses/cop5611/spring05/item/FFS.ppt o www.scs.ryerson.ca/~aabhari/File_System.ppt • http://flylib.com/books/en/3.224.1.79/1/ • http://osr507doc.sco.com/en/HANDBOOK/graphics/harddisk.gif 45