AOS Lab 9: File system -- Of buffers, logs, and blocks

916 views
788 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
916
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

AOS Lab 9: File system -- Of buffers, logs, and blocks

  1. 1. Lab 9: File system – Of buffers, logs, and blocks Advanced Operating Systems Zubair Nabi zubair.nabi@itu.edu.pk April 3, 2013
  2. 2. Introduction The purpose of a file system is to: 1 Organize and store data
  3. 3. Introduction The purpose of a file system is to: 1 Organize and store data 2 Support sharing of data among users and applications
  4. 4. Introduction The purpose of a file system is to: 1 Organize and store data 2 Support sharing of data among users and applications 3 Ensure persistence of data after a reboot
  5. 5. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files
  6. 6. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files • Record the identities of the blocks that hold each file’s content
  7. 7. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files • Record the identities of the blocks that hold each file’s content • Keep track of the areas of the disk which are free
  8. 8. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files • Record the identities of the blocks that hold each file’s content • Keep track of the areas of the disk which are free • The file system needs to support crash recovery
  9. 9. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files • Record the identities of the blocks that hold each file’s content • Keep track of the areas of the disk which are free • The file system needs to support crash recovery • A restart must not corrupt the file system or leave it in an inconsistent state
  10. 10. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files • Record the identities of the blocks that hold each file’s content • Keep track of the areas of the disk which are free • The file system needs to support crash recovery • A restart must not corrupt the file system or leave it in an inconsistent state • The file system can be accessed by multiple processes at the same time and this access needs to be synchronized
  11. 11. Challenges • Need on-disk data structures to: • Represent the tree of named directories and files • Record the identities of the blocks that hold each file’s content • Keep track of the areas of the disk which are free • The file system needs to support crash recovery • A restart must not corrupt the file system or leave it in an inconsistent state • The file system can be accessed by multiple processes at the same time and this access needs to be synchronized • Disk access is orders of magnitude slower than memory access, so the file system must maintain an in-memory cache of popular blocks
  12. 12. xv6 FS layers File descriptors System calls Recursive lookup Pathnames Directory inodes Inodes and block allocator Logging Buffer cache Directories Files Transactions Blocks
  13. 13. xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks
  14. 14. xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks • Ensures that only one kernel process can edit any particular block at a time
  15. 15. xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks • Ensures that only one kernel process can edit any particular block at a time 2 Logging: Ensures atomicity by enabling higher layers to wrap updates to several blocks in a transaction
  16. 16. xv6 FS layers (2) 1 Buffer cache: Reads and writes blocks on the IDE disk via the buffer cache, which synchronizes access to disk blocks • Ensures that only one kernel process can edit any particular block at a time 2 Logging: Ensures atomicity by enabling higher layers to wrap updates to several blocks in a transaction 3 Inodes and block allocator: Provides unnamed files, each unnamed file is represented by an inode and a sequence of blocks holding the file content
  17. 17. xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode
  18. 18. xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode • The content of this inode is a sequence of directory entries, each of which contains a name and a reference to the named file’s inode
  19. 19. xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode • The content of this inode is a sequence of directory entries, each of which contains a name and a reference to the named file’s inode 5 Recursive lookup: Provides hierarchical path names such as /foo/bar/baz.txt, via recursive lookup
  20. 20. xv6 FS layers (3) 4 Directory inodes: Implements directories as a special kind of inode • The content of this inode is a sequence of directory entries, each of which contains a name and a reference to the named file’s inode 5 Recursive lookup: Provides hierarchical path names such as /foo/bar/baz.txt, via recursive lookup 6 File descriptors: Abstracts many Unix resources, such as pipes, devices, file, etc., using the file system interface
  21. 21. File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections
  22. 22. File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 • Block 0 holds the boot sector bitmap... ….. data... log...
  23. 23. File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 bitmap... data... log... ….. • Block 0 holds the boot sector • Block 1 (called the superblock) contains metadata about the file system
  24. 24. File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 bitmap... data... log... ….. • Block 0 holds the boot sector • Block 1 (called the superblock) contains metadata about the file system • File system size in blocks, the number of data blocks, the number of inodes, and the number of blocks in the log
  25. 25. File system layout • xv6 lays out inodes and content blocks on the disk by dividing the disk into several sections boot super 0 1 inodes... 2 bitmap... data... log... ….. • Block 0 holds the boot sector • Block 1 (called the superblock) contains metadata about the file system • File system size in blocks, the number of data blocks, the number of inodes, and the number of blocks in the log • Blocks starting at 2 hold inodes, with multiple inodes per block
  26. 26. File system layout boot super 0 1 inodes... 2 bitmap... data... log... ….. • inode blocks are followed by bitmap blocks which keep track of data blocks in use
  27. 27. File system layout boot super 0 1 inodes... 2 bitmap... data... log... ….. • inode blocks are followed by bitmap blocks which keep track of data blocks in use • Bitmap blocks are followed by data blocks which hold file and directory contents
  28. 28. File system layout boot super 0 1 inodes... 2 bitmap... data... log... ….. • inode blocks are followed by bitmap blocks which keep track of data blocks in use • Bitmap blocks are followed by data blocks which hold file and directory contents • Finally at the end, the blocks hold a log which is required by the transaction layer
  29. 29. Buffer cache layer • Has two main jobs: 1 Synchronize access to disk blocks
  30. 30. Buffer cache layer • Has two main jobs: 1 2 Synchronize access to disk blocks Cache popular blocks
  31. 31. Buffer cache layer • Has two main jobs: 1 2 Synchronize access to disk blocks Cache popular blocks • Main interface: 1 bread: Obtains a buffer containing a copy of a block
  32. 32. Buffer cache layer • Has two main jobs: 1 2 Synchronize access to disk blocks Cache popular blocks • Main interface: 1 bread: Obtains a buffer containing a copy of a block 2 bwrite: Writes a modified buffer
  33. 33. Buffer cache layer • Has two main jobs: 1 2 Synchronize access to disk blocks Cache popular blocks • Main interface: 1 bread: Obtains a buffer containing a copy of a block 2 bwrite: Writes a modified buffer 3 brelse: Releases a buffer (after a read or write)
  34. 34. Buffer cache layer (2) • Synchronizes access to each block by allowing only a single kernel thread to have a reference to the block’s buffer
  35. 35. Buffer cache layer (2) • Synchronizes access to each block by allowing only a single kernel thread to have a reference to the block’s buffer • If one thread is holding a reference to a buffer, other threads will sleep on it
  36. 36. Buffer cache layer (2) • Synchronizes access to each block by allowing only a single kernel thread to have a reference to the block’s buffer • If one thread is holding a reference to a buffer, other threads will sleep on it • The buffer cache has a fixed number of buffers to host disk blocks
  37. 37. Buffer cache layer (2) • Synchronizes access to each block by allowing only a single kernel thread to have a reference to the block’s buffer • If one thread is holding a reference to a buffer, other threads will sleep on it • The buffer cache has a fixed number of buffers to host disk blocks • If higher layers ask for a block that is not cached, the buffer cache recycles the least recently used buffer for this block
  38. 38. Buffer cache • The buffer cache is a doubly-linked of struct buf, with NBUF buffers, accessed via bcache.head
  39. 39. Buffer cache • The buffer cache is a doubly-linked of struct buf, with NBUF buffers, accessed via bcache.head • A buffer has three state bits
  40. 40. Buffer cache • The buffer cache is a doubly-linked of struct buf, with NBUF buffers, accessed via bcache.head • A buffer has three state bits 1 B_VALID
  41. 41. Buffer cache • The buffer cache is a doubly-linked of struct buf, with NBUF buffers, accessed via bcache.head • A buffer has three state bits 1 B_VALID 2 B_DIRTY
  42. 42. Buffer cache • The buffer cache is a doubly-linked of struct buf, with NBUF buffers, accessed via bcache.head • A buffer has three state bits 1 B_VALID 2 B_DIRTY 3 B_BUSY
  43. 43. bread • Makes a call to bget() to get a buffer for the given sector
  44. 44. bread • Makes a call to bget() to get a buffer for the given sector • If the buffer is not B_VALID, it makes a call to iderw to read it into the buffer cache
  45. 45. Code: bread struct buf* bread(uint dev, uint sector) { struct buf *b; b = bget(dev, sector); if(!(b->flags & B_VALID)) iderw(b); return b; }
  46. 46. bget • Scans the buffer list for uint dev and uint sector
  47. 47. bget • Scans the buffer list for uint dev and uint sector 1 If such a buffer is present and B_BUSY is not set, it sets it and returns the buffer
  48. 48. bget • Scans the buffer list for uint dev and uint sector 1 If such a buffer is present and B_BUSY is not set, it sets it and 2 returns the buffer If B_BUSY is set, it goes to sleep on the buffer
  49. 49. bget • Scans the buffer list for uint dev and uint sector 1 If such a buffer is present and B_BUSY is not set, it sets it and 2 returns the buffer If B_BUSY is set, it goes to sleep on the buffer • Important: After bget wakes up, it cannot assume that the buffer is available now – it might have been reused for a different sector – so it starts all over
  50. 50. bget • Scans the buffer list for uint dev and uint sector 1 If such a buffer is present and B_BUSY is not set, it sets it and 2 returns the buffer If B_BUSY is set, it goes to sleep on the buffer • Important: After bget wakes up, it cannot assume that the buffer is available now – it might have been reused for a different sector – so it starts all over 3 If the buffer is not present, it reuses an existing buffer and edits its metadata to record the new uint dev and uint sector and sets B_BUSY and clears B_VALID and B_DIRTY
  51. 51. bwrite • Once bread returns a buffer, the caller has exclusive use of it
  52. 52. bwrite • Once bread returns a buffer, the caller has exclusive use of it • If the caller writes to the buffer, it must call bwrite
  53. 53. bwrite • Once bread returns a buffer, the caller has exclusive use of it • If the caller writes to the buffer, it must call bwrite • bwrite sets B_DIRTY and makes a call to iderw
  54. 54. Code: bwrite void bwrite(struct buf *b) { if((b->flags & B_BUSY) == 0) panic("bwrite"); b->flags |= B_DIRTY; iderw(b); }
  55. 55. brelse • Moves the buffer from its current position to the front of the buffer cache linked list, clears the B_BUSY bit, wakes up any processes sleeping on that particular buffer
  56. 56. brelse • Moves the buffer from its current position to the front of the buffer cache linked list, clears the B_BUSY bit, wakes up any processes sleeping on that particular buffer • This moving orders the buffers by how recently they were used
  57. 57. brelse • Moves the buffer from its current position to the front of the buffer cache linked list, clears the B_BUSY bit, wakes up any processes sleeping on that particular buffer • This moving orders the buffers by how recently they were used • Why do we need to do this?
  58. 58. brelse • Moves the buffer from its current position to the front of the buffer cache linked list, clears the B_BUSY bit, wakes up any processes sleeping on that particular buffer • This moving orders the buffers by how recently they were used • Why do we need to do this? • Makes the scan in bget efficient – Remember its a doubly linked list
  59. 59. Code: brelse void brelse(struct buf *b) { if((b->flags & B_BUSY) == 0) panic("brelse"); acquire(&bcache.lock); b->next->prev = b->prev; b->prev->next = b->next; b->next = bcache.head.next; b->prev = &bcache.head; bcache.head.next->prev = b; bcache.head.next = b; b->flags &= ~B_BUSY; wakeup(b); release(&bcache.lock); }
  60. 60. Logging layer • xv6 implements file system fault tolerance through a simple logging mechanism
  61. 61. Logging layer • xv6 implements file system fault tolerance through a simple logging mechanism • System calls do not directly write file system data structures
  62. 62. Logging layer • xv6 implements file system fault tolerance through a simple logging mechanism • System calls do not directly write file system data structures • Instead: 1 A system call first writes a description of all the disk writes that it wishes to perform to a log on the disk
  63. 63. Logging layer • xv6 implements file system fault tolerance through a simple logging mechanism • System calls do not directly write file system data structures • Instead: A system call first writes a description of all the disk writes that it wishes to perform to a log on the disk 2 It then writes a special commit record to the log to specify that it contains a complete operation 1
  64. 64. Logging layer • xv6 implements file system fault tolerance through a simple logging mechanism • System calls do not directly write file system data structures • Instead: A system call first writes a description of all the disk writes that it wishes to perform to a log on the disk 2 It then writes a special commit record to the log to specify that it contains a complete operation 3 Next it copies the required writes to the on-disk file system data structures 1
  65. 65. Logging layer • xv6 implements file system fault tolerance through a simple logging mechanism • System calls do not directly write file system data structures • Instead: A system call first writes a description of all the disk writes that it wishes to perform to a log on the disk 2 It then writes a special commit record to the log to specify that it contains a complete operation 3 Next it copies the required writes to the on-disk file system data structures 4 Finally, it deletes the log 1
  66. 66. Recovery • In case of a reboot, the file system performs recovery by looking at the log file
  67. 67. Recovery • In case of a reboot, the file system performs recovery by looking at the log file • If the log contains the commit record, the recovery code copies the required writes to the on-disk data structures
  68. 68. Recovery • In case of a reboot, the file system performs recovery by looking at the log file • If the log contains the commit record, the recovery code copies the required writes to the on-disk data structures • If the log does not contain a complete operation, it is ignored and deleted
  69. 69. Correctness of recovery mechanism • If the crash occurs before the commit record, the log will be ignored, and the state of the disk will stay unmodified
  70. 70. Correctness of recovery mechanism • If the crash occurs before the commit record, the log will be ignored, and the state of the disk will stay unmodified • If the crash occurs after the commit record, then the recovery will replay all of the operation’s writes, even repeating them if the crash occurred during the write to the on-disk data structure
  71. 71. Correctness of recovery mechanism • If the crash occurs before the commit record, the log will be ignored, and the state of the disk will stay unmodified • If the crash occurs after the commit record, then the recovery will replay all of the operation’s writes, even repeating them if the crash occurred during the write to the on-disk data structure • In both cases, the correctness of the file system is preserved: Either all writes are reflected on the disk or none
  72. 72. Log design • The log resides at a fixed location at the end of the disk
  73. 73. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks
  74. 74. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks • The header block contains 1 An array of sector numbers, one for each of the logged data blocks
  75. 75. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks • The header block contains 1 2 An array of sector numbers, one for each of the logged data blocks Count of logged blocks
  76. 76. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks • The header block contains 1 2 An array of sector numbers, one for each of the logged data blocks Count of logged blocks • The header block is written to after a commit
  77. 77. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks • The header block contains 1 2 An array of sector numbers, one for each of the logged data blocks Count of logged blocks • The header block is written to after a commit • The count is set to zero once all logged blocks have been reflected in the file system
  78. 78. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks • The header block contains 1 2 An array of sector numbers, one for each of the logged data blocks Count of logged blocks • The header block is written to after a commit • The count is set to zero once all logged blocks have been reflected in the file system • The count will be zero in case of a crash before a commit
  79. 79. Log design • The log resides at a fixed location at the end of the disk • It consists of a header block and a set of data blocks • The header block contains 1 2 An array of sector numbers, one for each of the logged data blocks Count of logged blocks • The header block is written to after a commit • The count is set to zero once all logged blocks have been reflected in the file system • The count will be zero in case of a crash before a commit • The count will be non-zero in case of a crash after a commit
  80. 80. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call
  81. 81. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call • Only one system call can be in a transaction at any given time to ensure correctness
  82. 82. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call • Only one system call can be in a transaction at any given time to ensure correctness • The log holds at most one transaction at a time
  83. 83. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call • Only one system call can be in a transaction at any given time to ensure correctness • The log holds at most one transaction at a time • Only read system calls can execute concurrently with a transaction
  84. 84. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call • Only one system call can be in a transaction at any given time to ensure correctness • The log holds at most one transaction at a time • Only read system calls can execute concurrently with a transaction • A fixed amount of space on the disk is dedicated to hold the log
  85. 85. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call • Only one system call can be in a transaction at any given time to ensure correctness • The log holds at most one transaction at a time • Only read system calls can execute concurrently with a transaction • A fixed amount of space on the disk is dedicated to hold the log • No system call can write more distinct blocks than the size of the log
  86. 86. Log design (2) • A transaction sequence is indicated by the start and end sequence of writes in the system call • Only one system call can be in a transaction at any given time to ensure correctness • The log holds at most one transaction at a time • Only read system calls can execute concurrently with a transaction • A fixed amount of space on the disk is dedicated to hold the log • No system call can write more distinct blocks than the size of the log • Large writes are broken into multiple smaller writes so that each write can fit in the log
  87. 87. Code: Typical system call usage of log begin_trans(); ... bp = bread(...); bp->data[...] = ...; log_write(bp); ... commit_trans();
  88. 88. Log functions • begin_trans: Waits until it obtains exclusive use of the log
  89. 89. Log functions • begin_trans: Waits until it obtains exclusive use of the log • log_write: • Appends the block’s new content to the log on the disk
  90. 90. Log functions • begin_trans: Waits until it obtains exclusive use of the log • log_write: • Appends the block’s new content to the log on the disk • Leaves the modified block in the buffer cache so that subsequent reads of the block during the transaction will yield the updated state
  91. 91. Log functions • begin_trans: Waits until it obtains exclusive use of the log • log_write: • Appends the block’s new content to the log on the disk • Leaves the modified block in the buffer cache so that subsequent reads of the block during the transaction will yield the updated state • Records the block’s sector number in memory to find out when a block is written multiple times during a transaction and overwrite the block’s previous copy in the log
  92. 92. Log functions • begin_trans: Waits until it obtains exclusive use of the log • log_write: • Appends the block’s new content to the log on the disk • Leaves the modified block in the buffer cache so that subsequent reads of the block during the transaction will yield the updated state • Records the block’s sector number in memory to find out when a block is written multiple times during a transaction and overwrite the block’s previous copy in the log • commit_trans: 1 Writes the log’s header block to disk, updating the count
  93. 93. Log functions • begin_trans: Waits until it obtains exclusive use of the log • log_write: • Appends the block’s new content to the log on the disk • Leaves the modified block in the buffer cache so that subsequent reads of the block during the transaction will yield the updated state • Records the block’s sector number in memory to find out when a block is written multiple times during a transaction and overwrite the block’s previous copy in the log • commit_trans: 1 2 Writes the log’s header block to disk, updating the count Calls install_trans to copy each block from the log to the relevant location on the disk
  94. 94. Log functions • begin_trans: Waits until it obtains exclusive use of the log • log_write: • Appends the block’s new content to the log on the disk • Leaves the modified block in the buffer cache so that subsequent reads of the block during the transaction will yield the updated state • Records the block’s sector number in memory to find out when a block is written multiple times during a transaction and overwrite the block’s previous copy in the log • commit_trans: Writes the log’s header block to disk, updating the count Calls install_trans to copy each block from the log to the relevant location on the disk 3 Sets to count in the log header to zero 1 2
  95. 95. Code snippet: filewrite begin_trans(); ilock(f->ip); if ((r = writei(f->ip, addr + i, f->off, n1)) > 0) f->off += r; iunlock(f->ip); commit_trans();
  96. 96. Recovery • In case of a reboot, the file system performs recovery by looking at the log file
  97. 97. Recovery • In case of a reboot, the file system performs recovery by looking at the log file • If the log contains the commit record, the recovery code copies the required writes to the on-disk data structures
  98. 98. Recovery • In case of a reboot, the file system performs recovery by looking at the log file • If the log contains the commit record, the recovery code copies the required writes to the on-disk data structures • If the log does not contain a complete operation, it is ignored and deleted
  99. 99. Code snippet: recover_from_log static void recover_from_log(void) { read_head(); // if committed, copy from log to disk install_trans(); log.lh.n = 0; write_head(); // clear the log }
  100. 100. Code snippet: install_trans static void install_trans(void) { int tail; for (tail = 0; tail < log.lh.n; tail++) { // read log block struct buf *lbuf = bread(log.dev, log.start+tail+1); // read dst struct buf *dbuf = bread(log.dev, log.lh.sector[tail]); // copy block to dst memmove(dbuf->data, lbuf->data, BSIZE); bwrite(dbuf); // write dst to disk brelse(lbuf); brelse(dbuf); }
  101. 101. Block allocator • Maintains a free bitmap on disk; one bit per block
  102. 102. Block allocator • Maintains a free bitmap on disk; one bit per block • A zero bit means that the block is free while a one indicates that the block is in use
  103. 103. Block allocator • Maintains a free bitmap on disk; one bit per block • A zero bit means that the block is free while a one indicates that the block is in use • The bits for the boot sector, superblock, inode blocks, and bitmap blocks are always set
  104. 104. Block allocator • Maintains a free bitmap on disk; one bit per block • A zero bit means that the block is free while a one indicates that the block is in use • The bits for the boot sector, superblock, inode blocks, and bitmap blocks are always set • Provides two functions to allocate (balloc()) and de-allocate (bfree()) a block
  105. 105. balloc • Calls readsb to read the superblock to get metadata
  106. 106. balloc • Calls readsb to read the superblock to get metadata • Uses this metadata to traverse the entire bitmap and look for a bitmap in which the bit is zero
  107. 107. balloc • Calls readsb to read the superblock to get metadata • Uses this metadata to traverse the entire bitmap and look for a bitmap in which the bit is zero • If it finds a free block it updates the bitmap and returns the block
  108. 108. bfree • Finds the corresponding bitmap block
  109. 109. bfree • Finds the corresponding bitmap block • Clears its bitmap bit
  110. 110. Today’s task • xv6 does not allow concurrent transactions to the log which means that if a system call performs a long write operation, all other write system calls will block • Come up with a strategy to implement concurrent transactions to the log in terms of pseudo-code
  111. 111. Reading(s) • Chapter 6, “File system”, up to section “Code: directory layer" from “xv6: a simple, Unix-like teaching operating system”

×