Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
File System Implementation - Part2
1. File System Implementation (Part II)
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 1 / 51
3. File System Layers
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 3 / 51
4. File-System Implementation
Based on several on-disk and in-memory structures.
On-disk
• Boot control block (per volume)
• Volume control block (per volume)
• Directory structure (per file system)
• File control block (per file)
In-memory
• Mount table
• Directory structure
• The open-file table (system-wide and per process)
• Buffers of the file-system blocks
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 4 / 51
5. Virtual File Systems (VFS)
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 5 / 51
9. Free-Space Management (1/2)
File system maintains free-space list to track available blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
10. Free-Space Management (1/2)
File system maintains free-space list to track available blocks.
To create a file, OS searches the free-space list for the required
amount of space and allocates that space to the new file.
• This space is then removed from the free-space list.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
11. Free-Space Management (1/2)
File system maintains free-space list to track available blocks.
To create a file, OS searches the free-space list for the required
amount of space and allocates that space to the new file.
• This space is then removed from the free-space list.
When a file is deleted, its disk space is added to the free-space list.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
12. Free-Space Management (2/2)
Possible techniques:
• Bit vector
• Linked list
• Grouping
• Counting
• Space maps
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 10 / 51
13. Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
14. Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Block number calculation to find the location of the first free block:
(# of bits per word) × (# of 0-value words) + offset of first 1 bit
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
15. Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Block number calculation to find the location of the first free block:
(# of bits per word) × (# of 0-value words) + offset of first 1 bit
Inefficient unless the entire vector is kept in main memory.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
16. Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Block number calculation to find the location of the first free block:
(# of bits per word) × (# of 0-value words) + offset of first 1 bit
Inefficient unless the entire vector is kept in main memory.
Easy to get contiguous files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
17. Linked List
Linked list (free-list)
• Cannot get contiguous space easily
• No waste of space
• No need to traverse the entire list
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 12 / 51
18. Grouping
A modified version of the linked list approach
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
19. Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
20. Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
The first n − 1 of these blocks are actually free.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
21. Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
The first n − 1 of these blocks are actually free.
The last block contains the addresses of another n free blocks, and
so on.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
22. Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
The first n − 1 of these blocks are actually free.
The last block contains the addresses of another n free blocks, and
so on.
Easy to get contiguous files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
23. Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
24. Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering.
Keep address of first free block and count of following free blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
25. Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering.
Keep address of first free block and count of following free blocks.
Free space list then has entries containing addresses and counts.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
26. Space Map
Used in ZFS
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
27. Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
28. Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
29. Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Each metaslab has associated space map: uses counting algorithm
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
30. Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Each metaslab has associated space map: uses counting algorithm
Rather than write counting structures to disk, it uses log-structured
file-system techniques to record them.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
31. Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Each metaslab has associated space map: uses counting algorithm
Rather than write counting structures to disk, it uses log-structured
file-system techniques to record them.
Metaslab activity → load space map into memory in balanced-tree
structure
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
33. Efficiency and Performance
Disks are the major bottleneck in system performance.
A variety of techniques used to improve the efficiency and perfor-
mance of secondary storage.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 17 / 51
34. Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
35. Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
36. Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
• E.g., Unix inodes are pre-allocated on a volume.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
37. Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
• E.g., Unix inodes are pre-allocated on a volume.
• Even an empty disk has a percentage of its space lost to inodes.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
38. Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
• E.g., Unix inodes are pre-allocated on a volume.
• Even an empty disk has a percentage of its space lost to inodes.
• It improves the file system’s performance, but consumes disk space.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
39. Efficiency (2/2)
Types of data kept in file’s directory entry.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
40. Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
41. Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
42. Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
43. Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Fixed-size or varying-size data structures.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
44. Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Fixed-size or varying-size data structures.
• E.g., Fix length process table: no more process after the process
table becomes full
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
45. Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Fixed-size or varying-size data structures.
• E.g., Fix length process table: no more process after the process
table becomes full
• Make them dynamic.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
46. Performance
Techniques to improve the file system performance:
• Unified buffer cache
• Optimizing sequential access
• Synchronous and asynchronous writes
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 20 / 51
47. Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
48. Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Page cache
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
49. Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Page cache
• Cache file data as pages rather than as file-system blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
50. Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Page cache
• Cache file data as pages rather than as file-system blocks.
• More efficient: accesses interface with virtual memory rather than
the file system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
51. Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
52. Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
53. Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Without a unified buffer cache:
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
54. Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Without a unified buffer cache:
• The read() and write() go through
the buffer cache.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
55. Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Without a unified buffer cache:
• The read() and write() go through
the buffer cache.
• The memory-mapping call, requires
using two caches: the page cache and
the buffer cache.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
56. Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
57. Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
• The contents of the file in the buffer cache must be copied into the
page cache: double caching
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
58. Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
• The contents of the file in the buffer cache must be copied into the
page cache: double caching
• Waste of memory, CPU and I/O cycles
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
59. Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
• The contents of the file in the buffer cache must be copied into the
page cache: double caching
• Waste of memory, CPU and I/O cycles
• Inconsistency
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
60. Unified Buffer Cache (4/4)
With unified buffer cache.
Both memory mapping and the read() and write() use the same
page cache.
LRU for block or page replacement.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 24 / 51
61. Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
62. Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
• The most recently used page will be used last, or perhaps never
again.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
63. Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
• The most recently used page will be used last, or perhaps never
again.
Free-behind: removes a page from the buffer as soon as the next
page is requested.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
64. Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
• The most recently used page will be used last, or perhaps never
again.
Free-behind: removes a page from the buffer as soon as the next
page is requested.
Read-ahead: a requested page and several subsequent pages are
read and cached.
• Retrieving these data from the disk in one transfer and caching
them saves time.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
65. Synchronous and Asynchronous Writes
Synchronous writes sometimes requested by applications or needed
by OS.
• No buffering/caching: writes must hit disk before acknowledgement.
Asynchronous writes more common, buffer-able, faster.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 26 / 51
67. Recovery
A system crash can cause inconsistencies among on-disk file-system
data structures.
• E.g., directory structures, free-block pointers, and free FCB
pointers.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
68. Recovery
A system crash can cause inconsistencies among on-disk file-system
data structures.
• E.g., directory structures, free-block pointers, and free FCB
pointers.
• For example, the free FCB count might indicate that an FCB had
been allocated, but the directory structure might not point to the
FCB.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
69. Recovery
A system crash can cause inconsistencies among on-disk file-system
data structures.
• E.g., directory structures, free-block pointers, and free FCB
pointers.
• For example, the free FCB count might indicate that an FCB had
been allocated, but the directory structure might not point to the
FCB.
Methods to deal with corruption:
• Consistency checking
• Log-structured file systems
• Backup and restore
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
70. Consistency Checking
To detect a problem, OS scans of all the metadata on each file
system to confirm or deny the consistency of the system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
71. Consistency Checking
To detect a problem, OS scans of all the metadata on each file
system to confirm or deny the consistency of the system.
Consistency checking: compares data in directory structure with
data blocks on disk, and tries to fix inconsistencies.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
72. Consistency Checking
To detect a problem, OS scans of all the metadata on each file
system to confirm or deny the consistency of the system.
Consistency checking: compares data in directory structure with
data blocks on disk, and tries to fix inconsistencies.
Can be slow and sometimes fails.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
73. Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
74. Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
75. Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
All transactions are written to a log.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
76. Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
All transactions are written to a log.
• A transaction is considered committed, once it is written to the log.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
77. Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
All transactions are written to a log.
• A transaction is considered committed, once it is written to the log.
• Sometimes to a separate device or section of disk.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
78. Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
79. Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
• When the file system structures are modified, the transaction is
removed from the log.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
80. Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
• When the file system structures are modified, the transaction is
removed from the log.
If the file system crashes, all remaining transactions in the log must
still be performed.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
81. Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
• When the file system structures are modified, the transaction is
removed from the log.
If the file system crashes, all remaining transactions in the log must
still be performed.
Faster recovery from crash, removes chance of inconsistency of
metadata.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
82. Backup and Restore (1/2)
Back up data from disk to another storage device, such as a mag-
netic tape or other hard disk.
Recovery from the loss of an individual file, or of an entire disk, may
then be a matter of restoring the data from backup.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 32 / 51
83. Backup and Restore (2/2)
A typical backup schedule:
• Day 1. full backup: copy all files from the disk to a backup medium.
• Day 2. incremental backup: copy all files changed since day 1 to
another medium.
• Day 3. incremental backup: copy all files changed since day 2 to
another medium.
• ...
• Day N. incremental backup: copy all files changed since day N-1 to
another medium. Then go back to day 1.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 33 / 51
84. NFS
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 34 / 51
85. The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
86. The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
NFS views a set of interconnected workstations as a set of indepen-
dent machines with independent file systems.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
87. The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
NFS views a set of interconnected workstations as a set of indepen-
dent machines with independent file systems.
The goal is to allow some degree of sharing among these file systems
in a transparent manner.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
88. The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
NFS views a set of interconnected workstations as a set of indepen-
dent machines with independent file systems.
The goal is to allow some degree of sharing among these file systems
in a transparent manner.
Sharing is based on a client-server relationship: either TCP or
UDP/IP.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
89. The Network File System (2/2)
A remote directory is mounted over a local file system directory.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
90. The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
91. The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
92. The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Specification of the remote directory for the mount operation is
non-transparent.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
93. The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Specification of the remote directory for the mount operation is
non-transparent.
• The host name of the remote directory has to be provided.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
94. The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Specification of the remote directory for the mount operation is
non-transparent.
• The host name of the remote directory has to be provided.
• Files in the remote directory can then be accessed in a transparent
manner.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
95. NFS Mount (1/3)
Three independent file systems of machines named U, S1, and S2.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 37 / 51
96. NFS Mount (2/3)
Mounting S1:/usr/shared over U:/usr/local.
mount -t nfs S1:/usr/shared /usr/local
U can access any file within the dir1 using /usr/local/dir1.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 38 / 51
97. NFS Mount (3/3)
Cascading: mount a file system over another file system that is
remotely mounted.
Mounting S2:/usr/dir2 over U:/usr/local/dir1, which is al-
ready remotely mounted from S1.
mount -t nfs S2:/usr/dir2 /usr/local/dir1
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 39 / 51
98. NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
99. NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
100. NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
Mount request is mapped to corresponding RPC and forwarded to
mount server.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
101. NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
Mount request is mapped to corresponding RPC and forwarded to
mount server.
Export list: specifies local file systems that server exports for mount-
ing, along with names of machines that are permitted to mount
them.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
102. NFS Mount Protocol (2/2)
Following a mount request that conforms to its export list, the server
returns a file handle (a key for further accesses).
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
103. NFS Mount Protocol (2/2)
Following a mount request that conforms to its export list, the server
returns a file handle (a key for further accesses).
File handle: a file-system identifier and an inode number to identify
the mounted directory within the exported file system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
104. NFS Mount Protocol (2/2)
Following a mount request that conforms to its export list, the server
returns a file handle (a key for further accesses).
File handle: a file-system identifier and an inode number to identify
the mounted directory within the exported file system.
The mount operation changes only the user’s view and does not
affect the server side.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
105. NFS Protocol (1/2)
Provides a set of RPCs for remote file operations.
The procedures support the following operations:
• Searching for a file within a directory
• Reading a set of directory entries
• Manipulating links and directories
• Accessing file attributes
• Reading and writing files
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 42 / 51
106. NFS Protocol (2/2)
NFS servers are stateless; each request has to provide a full set of
arguments.
• NFS V4 is just coming available: very different, stateful
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
107. NFS Protocol (2/2)
NFS servers are stateless; each request has to provide a full set of
arguments.
• NFS V4 is just coming available: very different, stateful
Modified data must be committed to the server’s disk before results
are returned to the client.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
108. NFS Protocol (2/2)
NFS servers are stateless; each request has to provide a full set of
arguments.
• NFS V4 is just coming available: very different, stateful
Modified data must be committed to the server’s disk before results
are returned to the client.
The NFS protocol does not provide concurrency-control mecha-
nisms.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
109. Schematic View of NFS Architecture
NFS is integrated into the OS via a VFS.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 44 / 51
110. Three Major Layers of NFS Architecture
Unix file-system interface
• Based on the open, read, write, and close calls, and file descriptors.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
111. Three Major Layers of NFS Architecture
Unix file-system interface
• Based on the open, read, write, and close calls, and file descriptors.
Virtual File System (VFS) layer
• Distinguishes local files from remote ones, and local files are further
distinguished according to their file-system types.
• Calls the NFS protocol procedures for remote requests.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
112. Three Major Layers of NFS Architecture
Unix file-system interface
• Based on the open, read, write, and close calls, and file descriptors.
Virtual File System (VFS) layer
• Distinguishes local files from remote ones, and local files are further
distinguished according to their file-system types.
• Calls the NFS protocol procedures for remote requests.
NFS service layer
• Implements the NFS protocol.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
113. NFS Path-Name Translation
It involves the parsing of a path name into separate components.
E.g., /usr/local/dir1/file.txt into usr, local, and dir1.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
114. NFS Path-Name Translation
It involves the parsing of a path name into separate components.
E.g., /usr/local/dir1/file.txt into usr, local, and dir1.
Performs a separate NFS lookup call for every pair of component
name and directory vnode.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
115. NFS Path-Name Translation
It involves the parsing of a path name into separate components.
E.g., /usr/local/dir1/file.txt into usr, local, and dir1.
Performs a separate NFS lookup call for every pair of component
name and directory vnode.
To make lookup faster, a directory-name-lookup cache on the
client’s side holds the vnodes for remote directory names.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
116. NFS Remote Operations (1/2)
Nearly one-to-one correspondence between regular Unix system calls
and the NFS protocol RPCs.
• Except opening and closing files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 47 / 51
117. NFS Remote Operations (1/2)
Nearly one-to-one correspondence between regular Unix system calls
and the NFS protocol RPCs.
• Except opening and closing files.
NFS adheres to the remote-service paradigm, but employs buffering
and caching techniques for the sake of performance.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 47 / 51
118. NFS Remote Operations (2/2)
File-attribute cache: the attribute cache is updated whenever new
attributes arrive from the server.
• When a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes.
• By default, discarded after 60 seconds.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
119. NFS Remote Operations (2/2)
File-attribute cache: the attribute cache is updated whenever new
attributes arrive from the server.
• When a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes.
• By default, discarded after 60 seconds.
File-blocks cache
• Cached file blocks are used only if the corresponding cached
attributes are up to date.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
120. NFS Remote Operations (2/2)
File-attribute cache: the attribute cache is updated whenever new
attributes arrive from the server.
• When a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes.
• By default, discarded after 60 seconds.
File-blocks cache
• Cached file blocks are used only if the corresponding cached
attributes are up to date.
Clients do not free delayed-write blocks until the server confirms
that the data have been written to disk
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
122. Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
123. Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
124. Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Performance: unified buffer cache, optimizing sequential access,
synchronous and asynchronous writes
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
125. Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Performance: unified buffer cache, optimizing sequential access,
synchronous and asynchronous writes
Recovery: consistency checking, log-structured FS, backup and re-
store
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
126. Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Performance: unified buffer cache, optimizing sequential access,
synchronous and asynchronous writes
Recovery: consistency checking, log-structured FS, backup and re-
store
NFS: mount protocol, path-name translation, remote operation
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51