SlideShare a Scribd company logo
1 of 127
Download to read offline
File System Implementation (Part II)
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 1 / 51
Reminder
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 2 / 51
File System Layers
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 3 / 51
File-System Implementation
Based on several on-disk and in-memory structures.
On-disk
• Boot control block (per volume)
• Volume control block (per volume)
• Directory structure (per file system)
• File control block (per file)
In-memory
• Mount table
• Directory structure
• The open-file table (system-wide and per process)
• Buffers of the file-system blocks
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 4 / 51
Virtual File Systems (VFS)
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 5 / 51
Directory Implementation
Linear list
Hash table
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 6 / 51
Allocation Methods
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 7 / 51
Free Space Management
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 8 / 51
Free-Space Management (1/2)
File system maintains free-space list to track available blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
Free-Space Management (1/2)
File system maintains free-space list to track available blocks.
To create a file, OS searches the free-space list for the required
amount of space and allocates that space to the new file.
• This space is then removed from the free-space list.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
Free-Space Management (1/2)
File system maintains free-space list to track available blocks.
To create a file, OS searches the free-space list for the required
amount of space and allocates that space to the new file.
• This space is then removed from the free-space list.
When a file is deleted, its disk space is added to the free-space list.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
Free-Space Management (2/2)
Possible techniques:
• Bit vector
• Linked list
• Grouping
• Counting
• Space maps
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 10 / 51
Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Block number calculation to find the location of the first free block:
(# of bits per word) × (# of 0-value words) + offset of first 1 bit
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Block number calculation to find the location of the first free block:
(# of bits per word) × (# of 0-value words) + offset of first 1 bit
Inefficient unless the entire vector is kept in main memory.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
Bit Vector
Bit vector or bit map (n blocks)
bit[i] = 1: block i is free
bit[i] = 0: block i is occupied
Block number calculation to find the location of the first free block:
(# of bits per word) × (# of 0-value words) + offset of first 1 bit
Inefficient unless the entire vector is kept in main memory.
Easy to get contiguous files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
Linked List
Linked list (free-list)
• Cannot get contiguous space easily
• No waste of space
• No need to traverse the entire list
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 12 / 51
Grouping
A modified version of the linked list approach
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
The first n − 1 of these blocks are actually free.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
The first n − 1 of these blocks are actually free.
The last block contains the addresses of another n free blocks, and
so on.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
Grouping
A modified version of the linked list approach
It stores the addresses of n free blocks in the first free block.
The first n − 1 of these blocks are actually free.
The last block contains the addresses of another n free blocks, and
so on.
Easy to get contiguous files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering.
Keep address of first free block and count of following free blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering.
Keep address of first free block and count of following free blocks.
Free space list then has entries containing addresses and counts.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
Space Map
Used in ZFS
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Each metaslab has associated space map: uses counting algorithm
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Each metaslab has associated space map: uses counting algorithm
Rather than write counting structures to disk, it uses log-structured
file-system techniques to record them.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
Space Map
Used in ZFS
Consider meta-data I/O on very large file systems: full data struc-
tures like bit maps cannot fit in memory
Divides device space into metaslab units and manages metaslabs.
• A volume can contain hundreds of metaslabs.
Each metaslab has associated space map: uses counting algorithm
Rather than write counting structures to disk, it uses log-structured
file-system techniques to record them.
Metaslab activity → load space map into memory in balanced-tree
structure
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
Efficiency and Performance
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 16 / 51
Efficiency and Performance
Disks are the major bottleneck in system performance.
A variety of techniques used to improve the efficiency and perfor-
mance of secondary storage.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 17 / 51
Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
• E.g., Unix inodes are pre-allocated on a volume.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
• E.g., Unix inodes are pre-allocated on a volume.
• Even an empty disk has a percentage of its space lost to inodes.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
Efficiency (1/2)
Efficiency dependent on disk allocation and directory algorithms.
Pre-allocation or as-needed allocation of metadata structures.
• E.g., Unix inodes are pre-allocated on a volume.
• Even an empty disk has a percentage of its space lost to inodes.
• It improves the file system’s performance, but consumes disk space.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Fixed-size or varying-size data structures.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Fixed-size or varying-size data structures.
• E.g., Fix length process table: no more process after the process
table becomes full
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Efficiency (2/2)
Types of data kept in file’s directory entry.
• E.g., ”last write date” is recorded for some files.
• Updating this information is inefficient for frequently accessed files.
• Benefit against its performance cost?
Fixed-size or varying-size data structures.
• E.g., Fix length process table: no more process after the process
table becomes full
• Make them dynamic.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
Performance
Techniques to improve the file system performance:
• Unified buffer cache
• Optimizing sequential access
• Synchronous and asynchronous writes
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 20 / 51
Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Page cache
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Page cache
• Cache file data as pages rather than as file-system blocks.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
Unified Buffer Cache (1/4)
Buffer cache
• Separate section of main memory for frequently used blocks.
Page cache
• Cache file data as pages rather than as file-system blocks.
• More efficient: accesses interface with virtual memory rather than
the file system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Without a unified buffer cache:
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Without a unified buffer cache:
• The read() and write() go through
the buffer cache.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
Unified Buffer Cache (2/4)
Consider the two alternatives for opening and accessing a file:
• Memory mapping
• The standard system calls read() and write()
Without a unified buffer cache:
• The read() and write() go through
the buffer cache.
• The memory-mapping call, requires
using two caches: the page cache and
the buffer cache.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
• The contents of the file in the buffer cache must be copied into the
page cache: double caching
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
• The contents of the file in the buffer cache must be copied into the
page cache: double caching
• Waste of memory, CPU and I/O cycles
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
Unified Buffer Cache (3/4)
Virtual memory does not interface with the buffer cache.
• The contents of the file in the buffer cache must be copied into the
page cache: double caching
• Waste of memory, CPU and I/O cycles
• Inconsistency
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
Unified Buffer Cache (4/4)
With unified buffer cache.
Both memory mapping and the read() and write() use the same
page cache.
LRU for block or page replacement.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 24 / 51
Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
• The most recently used page will be used last, or perhaps never
again.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
• The most recently used page will be used last, or perhaps never
again.
Free-behind: removes a page from the buffer as soon as the next
page is requested.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
Optimizing Sequential Access
Optimizing page replacement in page cache for a file being read or
written sequentially.
• The most recently used page will be used last, or perhaps never
again.
Free-behind: removes a page from the buffer as soon as the next
page is requested.
Read-ahead: a requested page and several subsequent pages are
read and cached.
• Retrieving these data from the disk in one transfer and caching
them saves time.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
Synchronous and Asynchronous Writes
Synchronous writes sometimes requested by applications or needed
by OS.
• No buffering/caching: writes must hit disk before acknowledgement.
Asynchronous writes more common, buffer-able, faster.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 26 / 51
Recovery
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 27 / 51
Recovery
A system crash can cause inconsistencies among on-disk file-system
data structures.
• E.g., directory structures, free-block pointers, and free FCB
pointers.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
Recovery
A system crash can cause inconsistencies among on-disk file-system
data structures.
• E.g., directory structures, free-block pointers, and free FCB
pointers.
• For example, the free FCB count might indicate that an FCB had
been allocated, but the directory structure might not point to the
FCB.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
Recovery
A system crash can cause inconsistencies among on-disk file-system
data structures.
• E.g., directory structures, free-block pointers, and free FCB
pointers.
• For example, the free FCB count might indicate that an FCB had
been allocated, but the directory structure might not point to the
FCB.
Methods to deal with corruption:
• Consistency checking
• Log-structured file systems
• Backup and restore
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
Consistency Checking
To detect a problem, OS scans of all the metadata on each file
system to confirm or deny the consistency of the system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
Consistency Checking
To detect a problem, OS scans of all the metadata on each file
system to confirm or deny the consistency of the system.
Consistency checking: compares data in directory structure with
data blocks on disk, and tries to fix inconsistencies.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
Consistency Checking
To detect a problem, OS scans of all the metadata on each file
system to confirm or deny the consistency of the system.
Consistency checking: compares data in directory structure with
data blocks on disk, and tries to fix inconsistencies.
Can be slow and sometimes fails.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
All transactions are written to a log.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
All transactions are written to a log.
• A transaction is considered committed, once it is written to the log.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
Log Structured File Systems (1/2)
Transaction: a set of operations for performing a specific task.
Log structured (or journaling) file systems record each metadata
update to the file system as a transaction.
All transactions are written to a log.
• A transaction is considered committed, once it is written to the log.
• Sometimes to a separate device or section of disk.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
• When the file system structures are modified, the transaction is
removed from the log.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
• When the file system structures are modified, the transaction is
removed from the log.
If the file system crashes, all remaining transactions in the log must
still be performed.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
Log Structured File Systems (2/2)
The transactions in the log are asynchronously written to the file
system structures.
• When the file system structures are modified, the transaction is
removed from the log.
If the file system crashes, all remaining transactions in the log must
still be performed.
Faster recovery from crash, removes chance of inconsistency of
metadata.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
Backup and Restore (1/2)
Back up data from disk to another storage device, such as a mag-
netic tape or other hard disk.
Recovery from the loss of an individual file, or of an entire disk, may
then be a matter of restoring the data from backup.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 32 / 51
Backup and Restore (2/2)
A typical backup schedule:
• Day 1. full backup: copy all files from the disk to a backup medium.
• Day 2. incremental backup: copy all files changed since day 1 to
another medium.
• Day 3. incremental backup: copy all files changed since day 2 to
another medium.
• ...
• Day N. incremental backup: copy all files changed since day N-1 to
another medium. Then go back to day 1.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 33 / 51
NFS
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 34 / 51
The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
NFS views a set of interconnected workstations as a set of indepen-
dent machines with independent file systems.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
NFS views a set of interconnected workstations as a set of indepen-
dent machines with independent file systems.
The goal is to allow some degree of sharing among these file systems
in a transparent manner.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
The Network File System (1/2)
An implementation and a specification of a software system for ac-
cessing remote files across LANs or WANs.
NFS views a set of interconnected workstations as a set of indepen-
dent machines with independent file systems.
The goal is to allow some degree of sharing among these file systems
in a transparent manner.
Sharing is based on a client-server relationship: either TCP or
UDP/IP.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
The Network File System (2/2)
A remote directory is mounted over a local file system directory.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Specification of the remote directory for the mount operation is
non-transparent.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Specification of the remote directory for the mount operation is
non-transparent.
• The host name of the remote directory has to be provided.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
The Network File System (2/2)
A remote directory is mounted over a local file system directory.
• The mounted directory looks like an integral subtree of the local file
system.
• It replaces the subtree descending from the local directory.
Specification of the remote directory for the mount operation is
non-transparent.
• The host name of the remote directory has to be provided.
• Files in the remote directory can then be accessed in a transparent
manner.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
NFS Mount (1/3)
Three independent file systems of machines named U, S1, and S2.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 37 / 51
NFS Mount (2/3)
Mounting S1:/usr/shared over U:/usr/local.
mount -t nfs S1:/usr/shared /usr/local
U can access any file within the dir1 using /usr/local/dir1.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 38 / 51
NFS Mount (3/3)
Cascading: mount a file system over another file system that is
remotely mounted.
Mounting S2:/usr/dir2 over U:/usr/local/dir1, which is al-
ready remotely mounted from S1.
mount -t nfs S2:/usr/dir2 /usr/local/dir1
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 39 / 51
NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
Mount request is mapped to corresponding RPC and forwarded to
mount server.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
NFS Mount Protocol (1/2)
Establishes initial logical connection between server and client.
Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
Mount request is mapped to corresponding RPC and forwarded to
mount server.
Export list: specifies local file systems that server exports for mount-
ing, along with names of machines that are permitted to mount
them.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
NFS Mount Protocol (2/2)
Following a mount request that conforms to its export list, the server
returns a file handle (a key for further accesses).
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
NFS Mount Protocol (2/2)
Following a mount request that conforms to its export list, the server
returns a file handle (a key for further accesses).
File handle: a file-system identifier and an inode number to identify
the mounted directory within the exported file system.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
NFS Mount Protocol (2/2)
Following a mount request that conforms to its export list, the server
returns a file handle (a key for further accesses).
File handle: a file-system identifier and an inode number to identify
the mounted directory within the exported file system.
The mount operation changes only the user’s view and does not
affect the server side.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
NFS Protocol (1/2)
Provides a set of RPCs for remote file operations.
The procedures support the following operations:
• Searching for a file within a directory
• Reading a set of directory entries
• Manipulating links and directories
• Accessing file attributes
• Reading and writing files
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 42 / 51
NFS Protocol (2/2)
NFS servers are stateless; each request has to provide a full set of
arguments.
• NFS V4 is just coming available: very different, stateful
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
NFS Protocol (2/2)
NFS servers are stateless; each request has to provide a full set of
arguments.
• NFS V4 is just coming available: very different, stateful
Modified data must be committed to the server’s disk before results
are returned to the client.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
NFS Protocol (2/2)
NFS servers are stateless; each request has to provide a full set of
arguments.
• NFS V4 is just coming available: very different, stateful
Modified data must be committed to the server’s disk before results
are returned to the client.
The NFS protocol does not provide concurrency-control mecha-
nisms.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
Schematic View of NFS Architecture
NFS is integrated into the OS via a VFS.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 44 / 51
Three Major Layers of NFS Architecture
Unix file-system interface
• Based on the open, read, write, and close calls, and file descriptors.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
Three Major Layers of NFS Architecture
Unix file-system interface
• Based on the open, read, write, and close calls, and file descriptors.
Virtual File System (VFS) layer
• Distinguishes local files from remote ones, and local files are further
distinguished according to their file-system types.
• Calls the NFS protocol procedures for remote requests.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
Three Major Layers of NFS Architecture
Unix file-system interface
• Based on the open, read, write, and close calls, and file descriptors.
Virtual File System (VFS) layer
• Distinguishes local files from remote ones, and local files are further
distinguished according to their file-system types.
• Calls the NFS protocol procedures for remote requests.
NFS service layer
• Implements the NFS protocol.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
NFS Path-Name Translation
It involves the parsing of a path name into separate components.
E.g., /usr/local/dir1/file.txt into usr, local, and dir1.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
NFS Path-Name Translation
It involves the parsing of a path name into separate components.
E.g., /usr/local/dir1/file.txt into usr, local, and dir1.
Performs a separate NFS lookup call for every pair of component
name and directory vnode.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
NFS Path-Name Translation
It involves the parsing of a path name into separate components.
E.g., /usr/local/dir1/file.txt into usr, local, and dir1.
Performs a separate NFS lookup call for every pair of component
name and directory vnode.
To make lookup faster, a directory-name-lookup cache on the
client’s side holds the vnodes for remote directory names.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
NFS Remote Operations (1/2)
Nearly one-to-one correspondence between regular Unix system calls
and the NFS protocol RPCs.
• Except opening and closing files.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 47 / 51
NFS Remote Operations (1/2)
Nearly one-to-one correspondence between regular Unix system calls
and the NFS protocol RPCs.
• Except opening and closing files.
NFS adheres to the remote-service paradigm, but employs buffering
and caching techniques for the sake of performance.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 47 / 51
NFS Remote Operations (2/2)
File-attribute cache: the attribute cache is updated whenever new
attributes arrive from the server.
• When a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes.
• By default, discarded after 60 seconds.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
NFS Remote Operations (2/2)
File-attribute cache: the attribute cache is updated whenever new
attributes arrive from the server.
• When a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes.
• By default, discarded after 60 seconds.
File-blocks cache
• Cached file blocks are used only if the corresponding cached
attributes are up to date.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
NFS Remote Operations (2/2)
File-attribute cache: the attribute cache is updated whenever new
attributes arrive from the server.
• When a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes.
• By default, discarded after 60 seconds.
File-blocks cache
• Cached file blocks are used only if the corresponding cached
attributes are up to date.
Clients do not free delayed-write blocks until the server confirms
that the data have been written to disk
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
Summary
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 49 / 51
Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Performance: unified buffer cache, optimizing sequential access,
synchronous and asynchronous writes
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Performance: unified buffer cache, optimizing sequential access,
synchronous and asynchronous writes
Recovery: consistency checking, log-structured FS, backup and re-
store
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
Summary
Free space management: bit vector, linked list, grouping, counting,
space maps
Efficiency: pre-allocated vs. as-needed allocation structures, types
of data, fixed-size vs. varying size structures
Performance: unified buffer cache, optimizing sequential access,
synchronous and asynchronous writes
Recovery: consistency checking, log-structured FS, backup and re-
store
NFS: mount protocol, path-name translation, remote operation
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
Questions?
Acknowledgements
Some slides were derived from Avi Silberschatz slides.
Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 51 / 51

More Related Content

What's hot

Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1Amir Payberah
 
The structure of process
The structure of processThe structure of process
The structure of processAbhaysinh Surve
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMSkoolkampus
 
Secondary storage structure-Operating System Concepts
Secondary storage structure-Operating System ConceptsSecondary storage structure-Operating System Concepts
Secondary storage structure-Operating System ConceptsArjun Kaimattathil
 
File systems virtualization in windows using mini filter drivers
File systems virtualization in windows using mini filter driversFile systems virtualization in windows using mini filter drivers
File systems virtualization in windows using mini filter driversMPNIKHIL
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSAbhishek Dutta
 
storage and file structure
storage and file structurestorage and file structure
storage and file structureSheshan Sheniwal
 
Operation System
Operation SystemOperation System
Operation SystemANANTHI1997
 
Storage management in operating system
Storage management in operating systemStorage management in operating system
Storage management in operating systemDeepikaT13
 
File Management in Operating Systems
File Management in Operating SystemsFile Management in Operating Systems
File Management in Operating Systemsvampugani
 
Record storage and primary file organization
Record storage and primary file organizationRecord storage and primary file organization
Record storage and primary file organizationJafar Nesargi
 
Forensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsForensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsGol D Roger
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System ImplementationWayne Jones Jnr
 

What's hot (19)

Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
 
The structure of process
The structure of processThe structure of process
The structure of process
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
Secondary storage structure-Operating System Concepts
Secondary storage structure-Operating System ConceptsSecondary storage structure-Operating System Concepts
Secondary storage structure-Operating System Concepts
 
File systems virtualization in windows using mini filter drivers
File systems virtualization in windows using mini filter driversFile systems virtualization in windows using mini filter drivers
File systems virtualization in windows using mini filter drivers
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
File System Implementation
File System ImplementationFile System Implementation
File System Implementation
 
storage and file structure
storage and file structurestorage and file structure
storage and file structure
 
OSCh14
OSCh14OSCh14
OSCh14
 
Operation System
Operation SystemOperation System
Operation System
 
Storage management in operating system
Storage management in operating systemStorage management in operating system
Storage management in operating system
 
File Management in Operating Systems
File Management in Operating SystemsFile Management in Operating Systems
File Management in Operating Systems
 
Ch12
Ch12Ch12
Ch12
 
File Systems
File SystemsFile Systems
File Systems
 
Record storage and primary file organization
Record storage and primary file organizationRecord storage and primary file organization
Record storage and primary file organization
 
Forensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsForensic artifacts in modern linux systems
Forensic artifacts in modern linux systems
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System Implementation
 
Operation System
Operation SystemOperation System
Operation System
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 

Similar to File System Implementation - Part2

Similar to File System Implementation - Part2 (20)

File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
 
File System Interface
File System InterfaceFile System Interface
File System Interface
 
chapter5-file system implementation.ppt
chapter5-file system implementation.pptchapter5-file system implementation.ppt
chapter5-file system implementation.ppt
 
OSCh12
OSCh12OSCh12
OSCh12
 
Ch12 OS
Ch12 OSCh12 OS
Ch12 OS
 
OS_Ch12
OS_Ch12OS_Ch12
OS_Ch12
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 
file management_part2_os_notes.ppt
file management_part2_os_notes.pptfile management_part2_os_notes.ppt
file management_part2_os_notes.ppt
 
File Management
File ManagementFile Management
File Management
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
 
Allocation and free space management
Allocation and free space managementAllocation and free space management
Allocation and free space management
 
File system
File systemFile system
File system
 
File system implementation
File system implementationFile system implementation
File system implementation
 
Ch11
Ch11Ch11
Ch11
 
Chubby
ChubbyChubby
Chubby
 
Chapter13
Chapter13Chapter13
Chapter13
 
Operating Systems - Implementing File Systems
Operating Systems - Implementing File SystemsOperating Systems - Implementing File Systems
Operating Systems - Implementing File Systems
 
CS403: Operating System :Lec 2 Function of OS.pptx
CS403: Operating System :Lec 2 Function of OS.pptxCS403: Operating System :Lec 2 Function of OS.pptx
CS403: Operating System :Lec 2 Function of OS.pptx
 
Ch11 file system implementation
Ch11   file system implementationCh11   file system implementation
Ch11 file system implementation
 
ch13
ch13ch13
ch13
 

More from Amir Payberah

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution NetworkAmir Payberah
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing FrameworksAmir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformAmir Payberah
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformAmir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module ProgrammingAmir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2Amir Payberah
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1Amir Payberah
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2Amir Payberah
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1Amir Payberah
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3Amir Payberah
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2Amir Payberah
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1Amir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Amir Payberah
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Amir Payberah
 

More from Amir Payberah (20)

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
 
Security
SecuritySecurity
Security
 
Protection
ProtectionProtection
Protection
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
IO Systems
IO SystemsIO Systems
IO Systems
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
 
Threads
ThreadsThreads
Threads
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1
 

Recently uploaded

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Recently uploaded (20)

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

File System Implementation - Part2

  • 1. File System Implementation (Part II) Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 1 / 51
  • 2. Reminder Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 2 / 51
  • 3. File System Layers Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 3 / 51
  • 4. File-System Implementation Based on several on-disk and in-memory structures. On-disk • Boot control block (per volume) • Volume control block (per volume) • Directory structure (per file system) • File control block (per file) In-memory • Mount table • Directory structure • The open-file table (system-wide and per process) • Buffers of the file-system blocks Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 4 / 51
  • 5. Virtual File Systems (VFS) Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 5 / 51
  • 6. Directory Implementation Linear list Hash table Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 6 / 51
  • 7. Allocation Methods Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 7 / 51
  • 8. Free Space Management Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 8 / 51
  • 9. Free-Space Management (1/2) File system maintains free-space list to track available blocks. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
  • 10. Free-Space Management (1/2) File system maintains free-space list to track available blocks. To create a file, OS searches the free-space list for the required amount of space and allocates that space to the new file. • This space is then removed from the free-space list. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
  • 11. Free-Space Management (1/2) File system maintains free-space list to track available blocks. To create a file, OS searches the free-space list for the required amount of space and allocates that space to the new file. • This space is then removed from the free-space list. When a file is deleted, its disk space is added to the free-space list. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 9 / 51
  • 12. Free-Space Management (2/2) Possible techniques: • Bit vector • Linked list • Grouping • Counting • Space maps Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 10 / 51
  • 13. Bit Vector Bit vector or bit map (n blocks) bit[i] = 1: block i is free bit[i] = 0: block i is occupied Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
  • 14. Bit Vector Bit vector or bit map (n blocks) bit[i] = 1: block i is free bit[i] = 0: block i is occupied Block number calculation to find the location of the first free block: (# of bits per word) × (# of 0-value words) + offset of first 1 bit Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
  • 15. Bit Vector Bit vector or bit map (n blocks) bit[i] = 1: block i is free bit[i] = 0: block i is occupied Block number calculation to find the location of the first free block: (# of bits per word) × (# of 0-value words) + offset of first 1 bit Inefficient unless the entire vector is kept in main memory. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
  • 16. Bit Vector Bit vector or bit map (n blocks) bit[i] = 1: block i is free bit[i] = 0: block i is occupied Block number calculation to find the location of the first free block: (# of bits per word) × (# of 0-value words) + offset of first 1 bit Inefficient unless the entire vector is kept in main memory. Easy to get contiguous files. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 11 / 51
  • 17. Linked List Linked list (free-list) • Cannot get contiguous space easily • No waste of space • No need to traverse the entire list Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 12 / 51
  • 18. Grouping A modified version of the linked list approach Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
  • 19. Grouping A modified version of the linked list approach It stores the addresses of n free blocks in the first free block. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
  • 20. Grouping A modified version of the linked list approach It stores the addresses of n free blocks in the first free block. The first n − 1 of these blocks are actually free. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
  • 21. Grouping A modified version of the linked list approach It stores the addresses of n free blocks in the first free block. The first n − 1 of these blocks are actually free. The last block contains the addresses of another n free blocks, and so on. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
  • 22. Grouping A modified version of the linked list approach It stores the addresses of n free blocks in the first free block. The first n − 1 of these blocks are actually free. The last block contains the addresses of another n free blocks, and so on. Easy to get contiguous files. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 13 / 51
  • 23. Counting Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or clustering. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
  • 24. Counting Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or clustering. Keep address of first free block and count of following free blocks. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
  • 25. Counting Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or clustering. Keep address of first free block and count of following free blocks. Free space list then has entries containing addresses and counts. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 14 / 51
  • 26. Space Map Used in ZFS Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
  • 27. Space Map Used in ZFS Consider meta-data I/O on very large file systems: full data struc- tures like bit maps cannot fit in memory Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
  • 28. Space Map Used in ZFS Consider meta-data I/O on very large file systems: full data struc- tures like bit maps cannot fit in memory Divides device space into metaslab units and manages metaslabs. • A volume can contain hundreds of metaslabs. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
  • 29. Space Map Used in ZFS Consider meta-data I/O on very large file systems: full data struc- tures like bit maps cannot fit in memory Divides device space into metaslab units and manages metaslabs. • A volume can contain hundreds of metaslabs. Each metaslab has associated space map: uses counting algorithm Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
  • 30. Space Map Used in ZFS Consider meta-data I/O on very large file systems: full data struc- tures like bit maps cannot fit in memory Divides device space into metaslab units and manages metaslabs. • A volume can contain hundreds of metaslabs. Each metaslab has associated space map: uses counting algorithm Rather than write counting structures to disk, it uses log-structured file-system techniques to record them. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
  • 31. Space Map Used in ZFS Consider meta-data I/O on very large file systems: full data struc- tures like bit maps cannot fit in memory Divides device space into metaslab units and manages metaslabs. • A volume can contain hundreds of metaslabs. Each metaslab has associated space map: uses counting algorithm Rather than write counting structures to disk, it uses log-structured file-system techniques to record them. Metaslab activity → load space map into memory in balanced-tree structure Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 15 / 51
  • 32. Efficiency and Performance Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 16 / 51
  • 33. Efficiency and Performance Disks are the major bottleneck in system performance. A variety of techniques used to improve the efficiency and perfor- mance of secondary storage. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 17 / 51
  • 34. Efficiency (1/2) Efficiency dependent on disk allocation and directory algorithms. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
  • 35. Efficiency (1/2) Efficiency dependent on disk allocation and directory algorithms. Pre-allocation or as-needed allocation of metadata structures. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
  • 36. Efficiency (1/2) Efficiency dependent on disk allocation and directory algorithms. Pre-allocation or as-needed allocation of metadata structures. • E.g., Unix inodes are pre-allocated on a volume. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
  • 37. Efficiency (1/2) Efficiency dependent on disk allocation and directory algorithms. Pre-allocation or as-needed allocation of metadata structures. • E.g., Unix inodes are pre-allocated on a volume. • Even an empty disk has a percentage of its space lost to inodes. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
  • 38. Efficiency (1/2) Efficiency dependent on disk allocation and directory algorithms. Pre-allocation or as-needed allocation of metadata structures. • E.g., Unix inodes are pre-allocated on a volume. • Even an empty disk has a percentage of its space lost to inodes. • It improves the file system’s performance, but consumes disk space. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 18 / 51
  • 39. Efficiency (2/2) Types of data kept in file’s directory entry. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 40. Efficiency (2/2) Types of data kept in file’s directory entry. • E.g., ”last write date” is recorded for some files. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 41. Efficiency (2/2) Types of data kept in file’s directory entry. • E.g., ”last write date” is recorded for some files. • Updating this information is inefficient for frequently accessed files. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 42. Efficiency (2/2) Types of data kept in file’s directory entry. • E.g., ”last write date” is recorded for some files. • Updating this information is inefficient for frequently accessed files. • Benefit against its performance cost? Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 43. Efficiency (2/2) Types of data kept in file’s directory entry. • E.g., ”last write date” is recorded for some files. • Updating this information is inefficient for frequently accessed files. • Benefit against its performance cost? Fixed-size or varying-size data structures. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 44. Efficiency (2/2) Types of data kept in file’s directory entry. • E.g., ”last write date” is recorded for some files. • Updating this information is inefficient for frequently accessed files. • Benefit against its performance cost? Fixed-size or varying-size data structures. • E.g., Fix length process table: no more process after the process table becomes full Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 45. Efficiency (2/2) Types of data kept in file’s directory entry. • E.g., ”last write date” is recorded for some files. • Updating this information is inefficient for frequently accessed files. • Benefit against its performance cost? Fixed-size or varying-size data structures. • E.g., Fix length process table: no more process after the process table becomes full • Make them dynamic. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 19 / 51
  • 46. Performance Techniques to improve the file system performance: • Unified buffer cache • Optimizing sequential access • Synchronous and asynchronous writes Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 20 / 51
  • 47. Unified Buffer Cache (1/4) Buffer cache • Separate section of main memory for frequently used blocks. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
  • 48. Unified Buffer Cache (1/4) Buffer cache • Separate section of main memory for frequently used blocks. Page cache Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
  • 49. Unified Buffer Cache (1/4) Buffer cache • Separate section of main memory for frequently used blocks. Page cache • Cache file data as pages rather than as file-system blocks. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
  • 50. Unified Buffer Cache (1/4) Buffer cache • Separate section of main memory for frequently used blocks. Page cache • Cache file data as pages rather than as file-system blocks. • More efficient: accesses interface with virtual memory rather than the file system. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 21 / 51
  • 51. Unified Buffer Cache (2/4) Consider the two alternatives for opening and accessing a file: Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
  • 52. Unified Buffer Cache (2/4) Consider the two alternatives for opening and accessing a file: • Memory mapping • The standard system calls read() and write() Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
  • 53. Unified Buffer Cache (2/4) Consider the two alternatives for opening and accessing a file: • Memory mapping • The standard system calls read() and write() Without a unified buffer cache: Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
  • 54. Unified Buffer Cache (2/4) Consider the two alternatives for opening and accessing a file: • Memory mapping • The standard system calls read() and write() Without a unified buffer cache: • The read() and write() go through the buffer cache. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
  • 55. Unified Buffer Cache (2/4) Consider the two alternatives for opening and accessing a file: • Memory mapping • The standard system calls read() and write() Without a unified buffer cache: • The read() and write() go through the buffer cache. • The memory-mapping call, requires using two caches: the page cache and the buffer cache. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 22 / 51
  • 56. Unified Buffer Cache (3/4) Virtual memory does not interface with the buffer cache. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
  • 57. Unified Buffer Cache (3/4) Virtual memory does not interface with the buffer cache. • The contents of the file in the buffer cache must be copied into the page cache: double caching Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
  • 58. Unified Buffer Cache (3/4) Virtual memory does not interface with the buffer cache. • The contents of the file in the buffer cache must be copied into the page cache: double caching • Waste of memory, CPU and I/O cycles Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
  • 59. Unified Buffer Cache (3/4) Virtual memory does not interface with the buffer cache. • The contents of the file in the buffer cache must be copied into the page cache: double caching • Waste of memory, CPU and I/O cycles • Inconsistency Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 23 / 51
  • 60. Unified Buffer Cache (4/4) With unified buffer cache. Both memory mapping and the read() and write() use the same page cache. LRU for block or page replacement. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 24 / 51
  • 61. Optimizing Sequential Access Optimizing page replacement in page cache for a file being read or written sequentially. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
  • 62. Optimizing Sequential Access Optimizing page replacement in page cache for a file being read or written sequentially. • The most recently used page will be used last, or perhaps never again. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
  • 63. Optimizing Sequential Access Optimizing page replacement in page cache for a file being read or written sequentially. • The most recently used page will be used last, or perhaps never again. Free-behind: removes a page from the buffer as soon as the next page is requested. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
  • 64. Optimizing Sequential Access Optimizing page replacement in page cache for a file being read or written sequentially. • The most recently used page will be used last, or perhaps never again. Free-behind: removes a page from the buffer as soon as the next page is requested. Read-ahead: a requested page and several subsequent pages are read and cached. • Retrieving these data from the disk in one transfer and caching them saves time. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 25 / 51
  • 65. Synchronous and Asynchronous Writes Synchronous writes sometimes requested by applications or needed by OS. • No buffering/caching: writes must hit disk before acknowledgement. Asynchronous writes more common, buffer-able, faster. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 26 / 51
  • 66. Recovery Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 27 / 51
  • 67. Recovery A system crash can cause inconsistencies among on-disk file-system data structures. • E.g., directory structures, free-block pointers, and free FCB pointers. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
  • 68. Recovery A system crash can cause inconsistencies among on-disk file-system data structures. • E.g., directory structures, free-block pointers, and free FCB pointers. • For example, the free FCB count might indicate that an FCB had been allocated, but the directory structure might not point to the FCB. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
  • 69. Recovery A system crash can cause inconsistencies among on-disk file-system data structures. • E.g., directory structures, free-block pointers, and free FCB pointers. • For example, the free FCB count might indicate that an FCB had been allocated, but the directory structure might not point to the FCB. Methods to deal with corruption: • Consistency checking • Log-structured file systems • Backup and restore Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 28 / 51
  • 70. Consistency Checking To detect a problem, OS scans of all the metadata on each file system to confirm or deny the consistency of the system. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
  • 71. Consistency Checking To detect a problem, OS scans of all the metadata on each file system to confirm or deny the consistency of the system. Consistency checking: compares data in directory structure with data blocks on disk, and tries to fix inconsistencies. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
  • 72. Consistency Checking To detect a problem, OS scans of all the metadata on each file system to confirm or deny the consistency of the system. Consistency checking: compares data in directory structure with data blocks on disk, and tries to fix inconsistencies. Can be slow and sometimes fails. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 29 / 51
  • 73. Log Structured File Systems (1/2) Transaction: a set of operations for performing a specific task. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
  • 74. Log Structured File Systems (1/2) Transaction: a set of operations for performing a specific task. Log structured (or journaling) file systems record each metadata update to the file system as a transaction. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
  • 75. Log Structured File Systems (1/2) Transaction: a set of operations for performing a specific task. Log structured (or journaling) file systems record each metadata update to the file system as a transaction. All transactions are written to a log. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
  • 76. Log Structured File Systems (1/2) Transaction: a set of operations for performing a specific task. Log structured (or journaling) file systems record each metadata update to the file system as a transaction. All transactions are written to a log. • A transaction is considered committed, once it is written to the log. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
  • 77. Log Structured File Systems (1/2) Transaction: a set of operations for performing a specific task. Log structured (or journaling) file systems record each metadata update to the file system as a transaction. All transactions are written to a log. • A transaction is considered committed, once it is written to the log. • Sometimes to a separate device or section of disk. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 30 / 51
  • 78. Log Structured File Systems (2/2) The transactions in the log are asynchronously written to the file system structures. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
  • 79. Log Structured File Systems (2/2) The transactions in the log are asynchronously written to the file system structures. • When the file system structures are modified, the transaction is removed from the log. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
  • 80. Log Structured File Systems (2/2) The transactions in the log are asynchronously written to the file system structures. • When the file system structures are modified, the transaction is removed from the log. If the file system crashes, all remaining transactions in the log must still be performed. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
  • 81. Log Structured File Systems (2/2) The transactions in the log are asynchronously written to the file system structures. • When the file system structures are modified, the transaction is removed from the log. If the file system crashes, all remaining transactions in the log must still be performed. Faster recovery from crash, removes chance of inconsistency of metadata. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 31 / 51
  • 82. Backup and Restore (1/2) Back up data from disk to another storage device, such as a mag- netic tape or other hard disk. Recovery from the loss of an individual file, or of an entire disk, may then be a matter of restoring the data from backup. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 32 / 51
  • 83. Backup and Restore (2/2) A typical backup schedule: • Day 1. full backup: copy all files from the disk to a backup medium. • Day 2. incremental backup: copy all files changed since day 1 to another medium. • Day 3. incremental backup: copy all files changed since day 2 to another medium. • ... • Day N. incremental backup: copy all files changed since day N-1 to another medium. Then go back to day 1. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 33 / 51
  • 84. NFS Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 34 / 51
  • 85. The Network File System (1/2) An implementation and a specification of a software system for ac- cessing remote files across LANs or WANs. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
  • 86. The Network File System (1/2) An implementation and a specification of a software system for ac- cessing remote files across LANs or WANs. NFS views a set of interconnected workstations as a set of indepen- dent machines with independent file systems. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
  • 87. The Network File System (1/2) An implementation and a specification of a software system for ac- cessing remote files across LANs or WANs. NFS views a set of interconnected workstations as a set of indepen- dent machines with independent file systems. The goal is to allow some degree of sharing among these file systems in a transparent manner. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
  • 88. The Network File System (1/2) An implementation and a specification of a software system for ac- cessing remote files across LANs or WANs. NFS views a set of interconnected workstations as a set of indepen- dent machines with independent file systems. The goal is to allow some degree of sharing among these file systems in a transparent manner. Sharing is based on a client-server relationship: either TCP or UDP/IP. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 35 / 51
  • 89. The Network File System (2/2) A remote directory is mounted over a local file system directory. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
  • 90. The Network File System (2/2) A remote directory is mounted over a local file system directory. • The mounted directory looks like an integral subtree of the local file system. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
  • 91. The Network File System (2/2) A remote directory is mounted over a local file system directory. • The mounted directory looks like an integral subtree of the local file system. • It replaces the subtree descending from the local directory. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
  • 92. The Network File System (2/2) A remote directory is mounted over a local file system directory. • The mounted directory looks like an integral subtree of the local file system. • It replaces the subtree descending from the local directory. Specification of the remote directory for the mount operation is non-transparent. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
  • 93. The Network File System (2/2) A remote directory is mounted over a local file system directory. • The mounted directory looks like an integral subtree of the local file system. • It replaces the subtree descending from the local directory. Specification of the remote directory for the mount operation is non-transparent. • The host name of the remote directory has to be provided. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
  • 94. The Network File System (2/2) A remote directory is mounted over a local file system directory. • The mounted directory looks like an integral subtree of the local file system. • It replaces the subtree descending from the local directory. Specification of the remote directory for the mount operation is non-transparent. • The host name of the remote directory has to be provided. • Files in the remote directory can then be accessed in a transparent manner. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 36 / 51
  • 95. NFS Mount (1/3) Three independent file systems of machines named U, S1, and S2. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 37 / 51
  • 96. NFS Mount (2/3) Mounting S1:/usr/shared over U:/usr/local. mount -t nfs S1:/usr/shared /usr/local U can access any file within the dir1 using /usr/local/dir1. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 38 / 51
  • 97. NFS Mount (3/3) Cascading: mount a file system over another file system that is remotely mounted. Mounting S2:/usr/dir2 over U:/usr/local/dir1, which is al- ready remotely mounted from S1. mount -t nfs S2:/usr/dir2 /usr/local/dir1 Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 39 / 51
  • 98. NFS Mount Protocol (1/2) Establishes initial logical connection between server and client. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
  • 99. NFS Mount Protocol (1/2) Establishes initial logical connection between server and client. Mount operation includes name of remote directory to be mounted and name of server machine storing it. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
  • 100. NFS Mount Protocol (1/2) Establishes initial logical connection between server and client. Mount operation includes name of remote directory to be mounted and name of server machine storing it. Mount request is mapped to corresponding RPC and forwarded to mount server. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
  • 101. NFS Mount Protocol (1/2) Establishes initial logical connection between server and client. Mount operation includes name of remote directory to be mounted and name of server machine storing it. Mount request is mapped to corresponding RPC and forwarded to mount server. Export list: specifies local file systems that server exports for mount- ing, along with names of machines that are permitted to mount them. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 40 / 51
  • 102. NFS Mount Protocol (2/2) Following a mount request that conforms to its export list, the server returns a file handle (a key for further accesses). Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
  • 103. NFS Mount Protocol (2/2) Following a mount request that conforms to its export list, the server returns a file handle (a key for further accesses). File handle: a file-system identifier and an inode number to identify the mounted directory within the exported file system. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
  • 104. NFS Mount Protocol (2/2) Following a mount request that conforms to its export list, the server returns a file handle (a key for further accesses). File handle: a file-system identifier and an inode number to identify the mounted directory within the exported file system. The mount operation changes only the user’s view and does not affect the server side. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 41 / 51
  • 105. NFS Protocol (1/2) Provides a set of RPCs for remote file operations. The procedures support the following operations: • Searching for a file within a directory • Reading a set of directory entries • Manipulating links and directories • Accessing file attributes • Reading and writing files Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 42 / 51
  • 106. NFS Protocol (2/2) NFS servers are stateless; each request has to provide a full set of arguments. • NFS V4 is just coming available: very different, stateful Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
  • 107. NFS Protocol (2/2) NFS servers are stateless; each request has to provide a full set of arguments. • NFS V4 is just coming available: very different, stateful Modified data must be committed to the server’s disk before results are returned to the client. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
  • 108. NFS Protocol (2/2) NFS servers are stateless; each request has to provide a full set of arguments. • NFS V4 is just coming available: very different, stateful Modified data must be committed to the server’s disk before results are returned to the client. The NFS protocol does not provide concurrency-control mecha- nisms. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 43 / 51
  • 109. Schematic View of NFS Architecture NFS is integrated into the OS via a VFS. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 44 / 51
  • 110. Three Major Layers of NFS Architecture Unix file-system interface • Based on the open, read, write, and close calls, and file descriptors. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
  • 111. Three Major Layers of NFS Architecture Unix file-system interface • Based on the open, read, write, and close calls, and file descriptors. Virtual File System (VFS) layer • Distinguishes local files from remote ones, and local files are further distinguished according to their file-system types. • Calls the NFS protocol procedures for remote requests. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
  • 112. Three Major Layers of NFS Architecture Unix file-system interface • Based on the open, read, write, and close calls, and file descriptors. Virtual File System (VFS) layer • Distinguishes local files from remote ones, and local files are further distinguished according to their file-system types. • Calls the NFS protocol procedures for remote requests. NFS service layer • Implements the NFS protocol. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 45 / 51
  • 113. NFS Path-Name Translation It involves the parsing of a path name into separate components. E.g., /usr/local/dir1/file.txt into usr, local, and dir1. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
  • 114. NFS Path-Name Translation It involves the parsing of a path name into separate components. E.g., /usr/local/dir1/file.txt into usr, local, and dir1. Performs a separate NFS lookup call for every pair of component name and directory vnode. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
  • 115. NFS Path-Name Translation It involves the parsing of a path name into separate components. E.g., /usr/local/dir1/file.txt into usr, local, and dir1. Performs a separate NFS lookup call for every pair of component name and directory vnode. To make lookup faster, a directory-name-lookup cache on the client’s side holds the vnodes for remote directory names. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 46 / 51
  • 116. NFS Remote Operations (1/2) Nearly one-to-one correspondence between regular Unix system calls and the NFS protocol RPCs. • Except opening and closing files. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 47 / 51
  • 117. NFS Remote Operations (1/2) Nearly one-to-one correspondence between regular Unix system calls and the NFS protocol RPCs. • Except opening and closing files. NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 47 / 51
  • 118. NFS Remote Operations (2/2) File-attribute cache: the attribute cache is updated whenever new attributes arrive from the server. • When a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes. • By default, discarded after 60 seconds. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
  • 119. NFS Remote Operations (2/2) File-attribute cache: the attribute cache is updated whenever new attributes arrive from the server. • When a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes. • By default, discarded after 60 seconds. File-blocks cache • Cached file blocks are used only if the corresponding cached attributes are up to date. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
  • 120. NFS Remote Operations (2/2) File-attribute cache: the attribute cache is updated whenever new attributes arrive from the server. • When a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes. • By default, discarded after 60 seconds. File-blocks cache • Cached file blocks are used only if the corresponding cached attributes are up to date. Clients do not free delayed-write blocks until the server confirms that the data have been written to disk Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 48 / 51
  • 121. Summary Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 49 / 51
  • 122. Summary Free space management: bit vector, linked list, grouping, counting, space maps Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
  • 123. Summary Free space management: bit vector, linked list, grouping, counting, space maps Efficiency: pre-allocated vs. as-needed allocation structures, types of data, fixed-size vs. varying size structures Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
  • 124. Summary Free space management: bit vector, linked list, grouping, counting, space maps Efficiency: pre-allocated vs. as-needed allocation structures, types of data, fixed-size vs. varying size structures Performance: unified buffer cache, optimizing sequential access, synchronous and asynchronous writes Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
  • 125. Summary Free space management: bit vector, linked list, grouping, counting, space maps Efficiency: pre-allocated vs. as-needed allocation structures, types of data, fixed-size vs. varying size structures Performance: unified buffer cache, optimizing sequential access, synchronous and asynchronous writes Recovery: consistency checking, log-structured FS, backup and re- store Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
  • 126. Summary Free space management: bit vector, linked list, grouping, counting, space maps Efficiency: pre-allocated vs. as-needed allocation structures, types of data, fixed-size vs. varying size structures Performance: unified buffer cache, optimizing sequential access, synchronous and asynchronous writes Recovery: consistency checking, log-structured FS, backup and re- store NFS: mount protocol, path-name translation, remote operation Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 50 / 51
  • 127. Questions? Acknowledgements Some slides were derived from Avi Silberschatz slides. Amir H. Payberah (Tehran Polytechnic) File System Implementation 1393/9/10 51 / 51