AOS Lab 10: File system -- Inodes and beyond

1,274 views

Published on

Published in: Technology
  • Be the first to comment

AOS Lab 10: File system -- Inodes and beyond

  1. 1. Lab 10: File system – Inodes and beyond Advanced Operating Systems Zubair Nabi zubair.nabi@itu.edu.pk April 10, 2013
  2. 2. Recap of Lab 9: xv6 FS layers File descriptors System calls Recursive lookup Pathnames Directory inodes Inodes and block allocator Logging Buffer cache Directories Files Transactions Blocks
  3. 3. Recap of Lab 9: xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks
  4. 4. Recap of Lab 9: xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks • Ensures that only one kernel process can edit any particular block at a time
  5. 5. Recap of Lab 9: xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks • Ensures that only one kernel process can edit any particular block at a time 2 Logging: Ensures atomicity by enabling higher layers to wrap updates to several blocks in a transaction
  6. 6. Recap of Lab 9: xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks • Ensures that only one kernel process can edit any particular block at a time 2 Logging: Ensures atomicity by enabling higher layers to wrap updates to several blocks in a transaction 3 Inodes and block allocator: Provides unnamed files, each unnamed file is represented by an inode and a sequence of blocks holding the file content
  7. 7. Recap of Lab 9: xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode
  8. 8. Recap of Lab 9: xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode • The content of this inode is a sequence of directory entries, each of which contains a name and a reference to the named file’s inode
  9. 9. Recap of Lab 9: xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode • The content of this inode is a sequence of directory entries, each of which contains a name and a reference to the named file’s inode 5 Recursive lookup: Provides hierarchical path names such as /foo/bar/baz.txt, via recursive lookup
  10. 10. Recap of Lab 9: xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode • The content of this inode is a sequence of directory entries, each of which contains a name and a reference to the named file’s inode 5 Recursive lookup: Provides hierarchical path names such as /foo/bar/baz.txt, via recursive lookup 6 File descriptors: Abstracts many Unix resources, such as pipes, devices, file, etc., using the file system interface
  11. 11. Recap of Lab 9: File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections
  12. 12. Recap of Lab 9: File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 • Block 0 holds the boot sector bitmap... ….. data... log...
  13. 13. Recap of Lab 9: File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 bitmap... data... log... ….. • Block 0 holds the boot sector • Block 1 (called the superblock) contains metadata about the file system
  14. 14. Recap of Lab 9: File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 bitmap... data... log... ….. • Block 0 holds the boot sector • Block 1 (called the superblock) contains metadata about the file system • File system size in blocks, the number of data blocks, the number of inodes, and the number of blocks in the log
  15. 15. Recap of Lab 9: File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 bitmap... data... log... ….. • Block 0 holds the boot sector • Block 1 (called the superblock) contains metadata about the file system • File system size in blocks, the number of data blocks, the number of inodes, and the number of blocks in the log • Blocks starting at 2 hold inodes, with multiple inodes per block
  16. 16. Recap of Lab 9: File system layout boot super 0 1 inodes... 2 bitmap... data... log... ….. • inode blocks are followed by bitmap blocks which keep track of data blocks in use
  17. 17. Recap of Lab 9: File system layout boot super 0 1 inodes... 2 bitmap... data... log... ….. • inode blocks are followed by bitmap blocks which keep track of data blocks in use • Bitmap blocks are followed by data blocks which hold file and directory contents
  18. 18. Recap of Lab 9: File system layout boot super 0 1 inodes... 2 bitmap... data... log... ….. • inode blocks are followed by bitmap blocks which keep track of data blocks in use • Bitmap blocks are followed by data blocks which hold file and directory contents • Finally at the end, the blocks hold a log which is required by the transaction layer
  19. 19. Inodes • Have two variants: 1 On-disk data structure containing a file’s size and list of data block numbers
  20. 20. Inodes • Have two variants: On-disk data structure containing a file’s size and list of data block numbers 2 In-memory version of the on-disk inode, along with extra information needed within the kernel 1
  21. 21. Inodes • Have two variants: On-disk data structure containing a file’s size and list of data block numbers 2 In-memory version of the on-disk inode, along with extra information needed within the kernel 1 • All on-disk inodes are stored in a contiguous area of disk, between the superblock and the bitmap block
  22. 22. Inodes • Have two variants: On-disk data structure containing a file’s size and list of data block numbers 2 In-memory version of the on-disk inode, along with extra information needed within the kernel 1 • All on-disk inodes are stored in a contiguous area of disk, between the superblock and the bitmap block • Each inode has the same size, so given a number n (called the inode number or i-number), it is simple to locate the corresponding inode
  23. 23. On-disk inodes • Represented by struct dinode
  24. 24. On-disk inodes • Represented by struct dinode • Contains a type field to distinguish between files, directories, and special files (devices)
  25. 25. On-disk inodes • Represented by struct dinode • Contains a type field to distinguish between files, directories, and special files (devices) • Zero indicates that the dinode is free
  26. 26. On-disk inodes • Represented by struct dinode • Contains a type field to distinguish between files, directories, and special files (devices) • Zero indicates that the dinode is free • Also keeps track of the number of directory entries that refer to this inode
  27. 27. On-disk inodes • Represented by struct dinode • Contains a type field to distinguish between files, directories, and special files (devices) • Zero indicates that the dinode is free • Also keeps track of the number of directory entries that refer to this inode • This reference count dictates when the inode should be freed
  28. 28. On-disk inodes • Represented by struct dinode • Contains a type field to distinguish between files, directories, and special files (devices) • Zero indicates that the dinode is free • Also keeps track of the number of directory entries that refer to this inode • This reference count dictates when the inode should be freed • Also has fields to hold number of bytes of content and the block numbers of disk blocks
  29. 29. Code: dinode struct dinode { short type; // File type short major; // Major device number (T_DEV only) short minor; // Minor device number (T_DEV only) short nlink; // Number of links to inode in file s uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses }; #define T_DIR 1 // Directory #define T_FILE 2 // File #define T_DEV 3 // Device
  30. 30. In-memory inodes • Represented by struct inode
  31. 31. In-memory inodes • Represented by struct inode • An inode is kept in memory if there are C pointers referring to it
  32. 32. In-memory inodes • Represented by struct inode • An inode is kept in memory if there are C pointers referring to it • These pointers come from file descriptors, current working directories, and kernel code
  33. 33. In-memory inodes • Represented by struct inode • An inode is kept in memory if there are C pointers referring to it • These pointers come from file descriptors, current working directories, and kernel code • iget and iput functions are used to acquire and release pointers to/from an inode
  34. 34. In-memory inodes • Represented by struct inode • An inode is kept in memory if there are C pointers referring to it • These pointers come from file descriptors, current working directories, and kernel code • iget and iput functions are used to acquire and release pointers to/from an inode • A pointer via an iget() call implements a weak form of a lock by ensuring that the inode will stay in the cache till the reference count goes down to zero
  35. 35. In-memory inodes • Represented by struct inode • An inode is kept in memory if there are C pointers referring to it • These pointers come from file descriptors, current working directories, and kernel code • iget and iput functions are used to acquire and release pointers to/from an inode • A pointer via an iget() call implements a weak form of a lock by ensuring that the inode will stay in the cache till the reference count goes down to zero • These pointers enable long-term references (open files and current directory) and to prevent deadlock in code that manipulates multiple inodes (pathname lookup)
  36. 36. Code: inode struct inode { uint dev; // Device number uint inum; // Inode number int ref; // Reference count int flags; // I_BUSY, I_VALID short type; // copy of disk inode short major; short minor; short nlink; uint size; uint addrs[NDIRECT+1]; }; #define I_BUSY 0x1 #define I_VALID 0x2
  37. 37. Inode locks and allocation • To ensure that an inode has valid content, the code must read it from disk
  38. 38. Inode locks and allocation • To ensure that an inode has valid content, the code must read it from disk • This read call must be wrapped around ilock and iunlock
  39. 39. Inode locks and allocation • To ensure that an inode has valid content, the code must read it from disk • This read call must be wrapped around ilock and iunlock • This allows multiple processes to hold a C pointer to an inode but only one process can lock it at a time
  40. 40. Inode locks and allocation • To ensure that an inode has valid content, the code must read it from disk • This read call must be wrapped around ilock and iunlock • This allows multiple processes to hold a C pointer to an inode but only one process can lock it at a time • Inodes are allocated via ialloc which works similar to balloc
  41. 41. Inode data • Data is found in the blocks pointed to by the addrs fields
  42. 42. Inode data • Data is found in the blocks pointed to by the addrs fields • Size of addrs is NDIRECT+1 where NDIRECT is 12
  43. 43. Inode data • Data is found in the blocks pointed to by the addrs fields • Size of addrs is NDIRECT+1 where NDIRECT is 12 • addrs can refer to 6KB of data
  44. 44. Inode data • Data is found in the blocks pointed to by the addrs fields • Size of addrs is NDIRECT+1 where NDIRECT is 12 • addrs can refer to 6KB of data • The 13th location in addrs field points to the indirect block (NINDIRECT) which points to 64KB of data
  45. 45. Inode data • Data is found in the blocks pointed to by the addrs fields • Size of addrs is NDIRECT+1 where NDIRECT is 12 • addrs can refer to 6KB of data • The 13th location in addrs field points to the indirect block (NINDIRECT) which points to 64KB of data • Therefore, while fixed-sized blocks simplify look up, the maximum size of a file in xv6 can be 70KB
  46. 46. Inodes content • bmap(struct inode *ip, uint bn) returns the disk address of the nth block within inode ip, masking away the complexity of direct and indirect blocks
  47. 47. Inodes content • bmap(struct inode *ip, uint bn) returns the disk address of the nth block within inode ip, masking away the complexity of direct and indirect blocks • If the data block does not exist, it is created
  48. 48. Inodes content • bmap(struct inode *ip, uint bn) returns the disk address of the nth block within inode ip, masking away the complexity of direct and indirect blocks • If the data block does not exist, it is created • itrunc(struct inode *ip) frees inode ip by setting its reference count to zero and freeing up blocks, both direct and indirect
  49. 49. Inodes content • bmap(struct inode *ip, uint bn) returns the disk address of the nth block within inode ip, masking away the complexity of direct and indirect blocks • If the data block does not exist, it is created • itrunc(struct inode *ip) frees inode ip by setting its reference count to zero and freeing up blocks, both direct and indirect • readi(struct inode *ip, char *dst, uint off, uint n) reads n blocks in inode ip starting from off into dst
  50. 50. Inodes content (2) • writei(struct inode *ip, char *src, uint off, uint n) works similar to readi but it:
  51. 51. Inodes content (2) • writei(struct inode *ip, char *src, uint off, uint n) works similar to readi but it: 1 Copies data in instead of out
  52. 52. Inodes content (2) • writei(struct inode *ip, char *src, uint off, uint n) works similar to readi but it: 1 2 Copies data in instead of out Extends the file if the write increases its size
  53. 53. Inodes content (2) • writei(struct inode *ip, char *src, uint off, uint n) works similar to readi but it: Copies data in instead of out Extends the file if the write increases its size 3 Updates the size in the inode 1 2
  54. 54. Inodes content (2) • writei(struct inode *ip, char *src, uint off, uint n) works similar to readi but it: Copies data in instead of out Extends the file if the write increases its size 3 Updates the size in the inode 1 2 • stati(struct inode *ip, struct stat *st) copies metadata of inode ip into st which is exposed to userspace via the stat system call
  55. 55. Directory layer • A directory is a file with an inode type T_DIR and data in the form of a sequence of directory entries
  56. 56. Directory layer • A directory is a file with an inode type T_DIR and data in the form of a sequence of directory entries • Each entry is a struct dirent
  57. 57. Directory layer • A directory is a file with an inode type T_DIR and data in the form of a sequence of directory entries • Each entry is a struct dirent struct dirent { ushort inum; // free, if zero char name[DIRSIZ]; }; #define DIRSIZ 14
  58. 58. dirlookup • Searches a directory for an entry with the given name
  59. 59. dirlookup • Searches a directory for an entry with the given name • Signature: struct inode* dirlookup(struct inode *dp, char *name, uint *poff)
  60. 60. dirlookup • Searches a directory for an entry with the given name • Signature: struct inode* dirlookup(struct inode *dp, char *name, uint *poff) • If it finds it, it returns a pointer to the corresponding inode via iget, unlocked, and returns the offset of the entry within the directory
  61. 61. dirlink • Writes a new directory entry with the given name and inode number into dp
  62. 62. dirlink • Writes a new directory entry with the given name and inode number into dp • Signature: int dirlink(struct inode *dp, char *name, uint inum)
  63. 63. Path names • Path name look up is enabled by multiple calls – one for each path component – to dirlookup
  64. 64. Path names • Path name look up is enabled by multiple calls – one for each path component – to dirlookup • namei takes a path and returns the corresponding inode
  65. 65. Path names • Path name look up is enabled by multiple calls – one for each path component – to dirlookup • namei takes a path and returns the corresponding inode • nameiparent is similar but returns the inode of the parent directory
  66. 66. Path names • Path name look up is enabled by multiple calls – one for each path component – to dirlookup • namei takes a path and returns the corresponding inode • nameiparent is similar but returns the inode of the parent directory • Both make a call to namex internally
  67. 67. namex • Starts by deciding where the path evaluation begins
  68. 68. namex • Starts by deciding where the path evaluation begins • If the path begins with /, evaluation starts at the root
  69. 69. namex • Starts by deciding where the path evaluation begins • If the path begins with /, evaluation starts at the root • Otherwise, the current directory
  70. 70. namex • Starts by deciding where the path evaluation begins • If the path begins with /, evaluation starts at the root • Otherwise, the current directory • Uses skipelem to parse the path into path elements
  71. 71. namex • Starts by deciding where the path evaluation begins • If the path begins with /, evaluation starts at the root • Otherwise, the current directory • Uses skipelem to parse the path into path elements • For each iteration (depending on the number of path elements), looks up name within the current path element inode till it finds the required inode and returns it
  72. 72. File descriptor layer • Everything in Unix is a file and this interface is enabled by the file descriptor layer
  73. 73. File descriptor layer • Everything in Unix is a file and this interface is enabled by the file descriptor layer • Each process has its own open files (or file descriptor) table
  74. 74. File descriptor layer • Everything in Unix is a file and this interface is enabled by the file descriptor layer • Each process has its own open files (or file descriptor) table • Each open file is represented by struct file
  75. 75. Code: struct file struct file { enum { FD_NONE, FD_PIPE, FD_INODE } type; int ref; // reference count char readable; char writable; struct pipe *pipe; struct inode *ip; uint off; };
  76. 76. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset
  77. 77. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file
  78. 78. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file • If multiple processes open the same independently, they will have their own struct file for it with a local I/O offset
  79. 79. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file • If multiple processes open the same independently, they will have their own struct file for it with a local I/O offset • The same struct file can appear multiple times within a) A process’s file table, and b) Across multiple processes
  80. 80. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file • If multiple processes open the same independently, they will have their own struct file for it with a local I/O offset • The same struct file can appear multiple times within a) A process’s file table, and b) Across multiple processes • When would this happen?
  81. 81. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file • If multiple processes open the same independently, they will have their own struct file for it with a local I/O offset • The same struct file can appear multiple times within a) A process’s file table, and b) Across multiple processes • When would this happen? • a happens when a process opens a file and then dups it and b takes place when it makes a call to fork
  82. 82. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file • If multiple processes open the same independently, they will have their own struct file for it with a local I/O offset • The same struct file can appear multiple times within a) A process’s file table, and b) Across multiple processes • When would this happen? • a happens when a process opens a file and then dups it and b takes place when it makes a call to fork • Reference count tracks the number of references to a particular open file
  83. 83. file • struct file is simply a wrapper around an inode or a pipe; plus an I/O offset • Each call to open creates a new struct file • If multiple processes open the same independently, they will have their own struct file for it with a local I/O offset • The same struct file can appear multiple times within a) A process’s file table, and b) Across multiple processes • When would this happen? • a happens when a process opens a file and then dups it and b takes place when it makes a call to fork • Reference count tracks the number of references to a particular open file • Read/write access is tracked by readable/writable fields
  84. 84. Global file table • All open files in the system are kept within a global file table (ftable)
  85. 85. Global file table • All open files in the system are kept within a global file table (ftable) • ftable has corresponding functions to: 1 Allocate a file: filealloc
  86. 86. Global file table • All open files in the system are kept within a global file table (ftable) • ftable has corresponding functions to: 1 Allocate a file: filealloc 2 Create a duplicate reference: filedup
  87. 87. Global file table • All open files in the system are kept within a global file table (ftable) • ftable has corresponding functions to: 1 Allocate a file: filealloc 2 Create a duplicate reference: filedup 3 Release a reference: fileclose
  88. 88. Global file table • All open files in the system are kept within a global file table (ftable) • ftable has corresponding functions to: 1 Allocate a file: filealloc 2 Create a duplicate reference: filedup 3 Release a reference: fileclose 4 Read from a file: fileread
  89. 89. Global file table • All open files in the system are kept within a global file table (ftable) • ftable has corresponding functions to: 1 Allocate a file: filealloc 2 Create a duplicate reference: filedup 3 Release a reference: fileclose 4 Read from a file: fileread 5 Write to a file: filewrite
  90. 90. File manipulation • filealloc: Scans the file table for an unreferenced file (f->ref == 0) and returns a new reference
  91. 91. File manipulation • filealloc: Scans the file table for an unreferenced file (f->ref == 0) and returns a new reference • filedup: Increments the reference count
  92. 92. File manipulation • filealloc: Scans the file table for an unreferenced file (f->ref == 0) and returns a new reference • filedup: Increments the reference count • fileclose: Decrements the reference count
  93. 93. File manipulation • filealloc: Scans the file table for an unreferenced file (f->ref == 0) and returns a new reference • filedup: Increments the reference count • fileclose: Decrements the reference count • If f->ref == 0, underlying pipe or inode is released
  94. 94. File manipulation (2) • filestat: Invokes stati and ensures that the file represents an inode
  95. 95. File manipulation (2) • filestat: Invokes stati and ensures that the file represents an inode • fileread and filewrite: 1 Check whether the operation is allowed by the open mode
  96. 96. File manipulation (2) • filestat: Invokes stati and ensures that the file represents an inode • fileread and filewrite: 1 2 Check whether the operation is allowed by the open mode Patch the call through to either the underlying pipe or inode implementation
  97. 97. File manipulation (2) • filestat: Invokes stati and ensures that the file represents an inode • fileread and filewrite: Check whether the operation is allowed by the open mode Patch the call through to either the underlying pipe or inode implementation 3 If the wrapper is around an inode, the I/O offset would be used and then advanced 1 2
  98. 98. File manipulation (2) • filestat: Invokes stati and ensures that the file represents an inode • fileread and filewrite: Check whether the operation is allowed by the open mode Patch the call through to either the underlying pipe or inode implementation 3 If the wrapper is around an inode, the I/O offset would be used and then advanced 4 Pipes have no concept of offset 1 2
  99. 99. System calls • sys_link and sys_unlink edit directories by creating or removing references to inodes
  100. 100. System calls • sys_link and sys_unlink edit directories by creating or removing references to inodes • sys_link creates a new name for an existing inode
  101. 101. System calls • sys_link and sys_unlink edit directories by creating or removing references to inodes • sys_link creates a new name for an existing inode 1 Takes as arguments two strings old and new
  102. 102. System calls • sys_link and sys_unlink edit directories by creating or removing references to inodes • sys_link creates a new name for an existing inode 1 Takes as arguments two strings old and new 2 Increments its nlink field – Number of links
  103. 103. System calls • sys_link and sys_unlink edit directories by creating or removing references to inodes • sys_link creates a new name for an existing inode 1 Takes as arguments two strings old and new 2 Increments its nlink field – Number of links 3 Creates a new directory entry pointing at old’s inode
  104. 104. System calls • sys_link and sys_unlink edit directories by creating or removing references to inodes • sys_link creates a new name for an existing inode 1 Takes as arguments two strings old and new 2 Increments its nlink field – Number of links 3 4 Creates a new directory entry pointing at old’s inode The new directory entry is on the same inode as the existing one
  105. 105. create • Creates a new name for a new inode
  106. 106. create • Creates a new name for a new inode • Generalizes the creation of three file creation system calls:
  107. 107. create • Creates a new name for a new inode • Generalizes the creation of three file creation system calls: 1 open with the O_CREATE flag creates a new file
  108. 108. create • Creates a new name for a new inode • Generalizes the creation of three file creation system calls: 1 open with the O_CREATE flag creates a new file 2 mkdir creates a new directory
  109. 109. create • Creates a new name for a new inode • Generalizes the creation of three file creation system calls: 1 open with the O_CREATE flag creates a new file 2 mkdir creates a new directory 3 mkdev creates a new device file
  110. 110. create (2) • Makes a call to dirlookup to check whether the name already exists
  111. 111. create (2) • Makes a call to dirlookup to check whether the name already exists • If it does not exist, creates a new inode via a call to ialloc
  112. 112. create (2) • Makes a call to dirlookup to check whether the name already exists • If it does not exist, creates a new inode via a call to ialloc • If create has been invoked by mkdir (T_DIR), it initializes it with . and .. entries
  113. 113. create (2) • Makes a call to dirlookup to check whether the name already exists • If it does not exist, creates a new inode via a call to ialloc • If create has been invoked by mkdir (T_DIR), it initializes it with . and .. entries • Finally, it links it into the parent directory
  114. 114. Buffer cache eviction policy • xv6’s buffer cache uses simple LRU eviction
  115. 115. Buffer cache eviction policy • xv6’s buffer cache uses simple LRU eviction • A number of different policies can be implemented such as FIFO, not frequently used, aging, random, etc.
  116. 116. Buffer cache eviction policy • xv6’s buffer cache uses simple LRU eviction • A number of different policies can be implemented such as FIFO, not frequently used, aging, random, etc. • The buffer cache is currently a linked list but an efficient implementation can replace it with a hash table and/or a heap
  117. 117. Buffer cache eviction policy • xv6’s buffer cache uses simple LRU eviction • A number of different policies can be implemented such as FIFO, not frequently used, aging, random, etc. • The buffer cache is currently a linked list but an efficient implementation can replace it with a hash table and/or a heap • The buffer cache can also be integrated with the virtual memory system to enabled memory-mapped files (mmap in Linux)
  118. 118. Today’s task • xv6 has no support for memory-mapped files • Come up with a design to implement mmap1 1 http://man7.org/linux/man-pages/man2/mmap.2.html
  119. 119. Reading(s) • Chapter 6, “File system”, from “Code: directory layer" onwards from “xv6: a simple, Unix-like teaching operating system”

×