Introduction
One of the key goals for the Windows Subsystem for Linux is to allow users to work with their
files as they would on Linux, while giving full interoperability with files the user already has on
their Windows machine. Unlike a virtual machine, where you have to use network shares or
other solutions to share files between the host and guest OS, WSL has direct access to all your
Windows drives to allow for easy interop.
Windows file systems differ substantially from Linux file systems, and this post looks into how
WSL bridges those two worlds.
File systems on Linux
Linux abstracts file systems operations through the Virtual File System (VFS), which provides
both an interface for user mode programs to interact with the file system (through system calls
such as open, read, chmod, stat, etc.) and an interface that file systems have to implement. This
allows multiple file systems to coexist, providing the same operations and semantics, with VFS
giving a single namespace view of all these file systems to the user.
File systems are mounted on different directories in this namespace. For example, on a typical
Linux system your hard drive may be mounted at the root, /, with directories such as /dev, /proc,
/sys, and /mnt/cdrom all mounting different file systems which may be on different devices.
Examples of file systems used on Linux include ext4, rfs, FAT, and others.
VFS implements the various system calls for file system operations by using a number of data
structures such as inodes, directory entries and files, and related callbacks that file systems must
implement.
Inodes
The inode is the central data structure used in VFS. It represents a file system object such as a
regular file, directory, symbolic link, etc. An inode contains information about the file type, size,
permissions, last modified time, and other attributes. For many common Linux disk file systems
such as ext4, the on-disk data structures used to represent file metadata directly correspond to the
inode structure used by the Linux kernel.
While an inode represents a file, it does not represent a file name. A single file may have
multiple names, or hard links, but only one inode.
File systems provide a lookup callback to VFS which is used to retrieve an inode for a particular
file, based on the parent inode and the child name. File systems must implement a number of
other inode operations such as chmod, stat, open, etc.
Directory entries
VFS uses a directory entry cache to represent your file system namespace. Directory entries only
exist in memory, and contain a pointer to the inode for the file. For example, if you have a path
like /home/user/foo, there is a directory entry for home, user, and foo, each with a pointer to an
inode. Directory entries are cached for fast lookup, but if an entry is not yet in the cache, the
inode lookup operation is used to retrieve the inode from the file system so a new directory entry
can be created.
File objects
When an inode is opened, a file object is created for that file which keeps track of things like the
file offset and whether the file was opened for read, write or both. File systems must provide a
number of file operations such as read, write, sync, etc.
File descriptors
Applications refer to file objects through file descriptors. These are numeric values, unique to a
process, that refer to any files the process has open. File descriptors can refer to other types of
objects that provide a file-like interface in Linux, including ttys, sockets, and pipes. Multiple file
descriptors can refer to the same file object, e.g. through use of the dup system call.
Special file types
Besides just regular files and directories, Linux supports a number of additional file types. These
include device files, FIFOs, sockets, and symbolic links.
Some of these files affect how paths are parsed. Symbolic links are special files that refer to a
different file or directory, and following them is handled seamlessly by VFS. If you open the
path /foo/bar/baz and bar is a symbolic link to /zed, then you will actually open /zed/baz instead.
Similarly, a directory may be used as a mount point for another file system. In this case, when a
path crosses this directory, all inode operations below the mount point go to the new file system.
Special and pseudo file systems
Linux uses a number of file systems that don’t read files from a disk. TmpFs is used as a
temporary, in-memory file system, whose contents will not be persisted. ProcFs and SysFs both
provide access to kernel information about processes, devices and drivers. These file systems do
not have a disk, network or other device associated with them, and instead are virtualized by the
kernel.
File systems on Windows
Windows generalizes all system resources into objects. These include not just files, but also
things like threads, shared memory sections, and timers, just to name a few. All requests to open
a file ultimately go through the Object Manager in the NT kernel, which routes the request
through the I/O Manager to the correct file system driver. The interface that file system drivers
implement in Windows is more generic and enforces fewer requirements. For example, there is
no common inode structure or anything similar, nor is there a directory entry; instead, file system
drivers such as ntfs.sys are responsible for resolving paths and opening file objects.
File systems in Windows are typically mounted on drive letters like C:, D:, etc., although they
can be mounted on directories in other file systems as well. These drive letters are actually a
construct of Win32, and not something that the Object Manager directly deals with. The Object
Manager keeps a namespace that looks similar to the Linux file system namespace, rooted in ,
with file system volumes represented by device objects with paths like
DeviceHarddiskVolume1.
When you open a file using a path like C:foobar, the Win32 CreateFile call translates this to
an NT path of the form DosDeviceC:foobar, where DosDeviceC: is actually a symbolic
link to, for example, DeviceHarddiskVolume4. Therefore, the real full path to the file is
actually DeviceHarddiskVolume4foobar. The object manager resolves each component of
the path, similar to how VFS would in Linux, until it encounters the device object. At this point,
it forwards the request to the I/O manager, which creates an I/O Request Packet (IRP) with the
remaining path, which it sends to the file system driver for the device.
File objects
When a file is opened, the object manager creates a file object for it. Instead of file descriptors,
the object manager provides handles to file objects. Handles can actually refer to any object
manager object, not just files.
When you call a system call like NtReadFile (typically through the Win32 ReadFile function),
the I/O manager again creates an IRP to send down to the file system driver for the file object to
perform the request.
Because there are no inodes or anything similar in NT, most operations on files in Windows
require a file object.
Reparse points
Windows only supports two file types: regular files and directories. Both files and directories
can be reparse points, which are special files that have a fixed header and a block of arbitrary
data. The header includes a tag that identifies the type of reparse point, which must be handled
by a file system filter driver, or for built-in reparse point types, the I/O manager itself.
Reparse points are used to implement symbolic links and mount points. In these cases, the tag
indicates that the reparse point is a symbolic link or mount, and the data associated with the
reparse point contains the link target, or volume name for mount points. Reparse points can also
be used for other functionality such as the placeholder files used by OneDrive in Windows 8.
Solution
Introduction
One of the key goals for the Windows Subsystem for Linux is to allow users to work with their
files as they would on Linux, while giving full interoperability with files the user already has on
their Windows machine. Unlike a virtual machine, where you have to use network shares or
other solutions to share files between the host and guest OS, WSL has direct access to all your
Windows drives to allow for easy interop.
Windows file systems differ substantially from Linux file systems, and this post looks into how
WSL bridges those two worlds.
File systems on Linux
Linux abstracts file systems operations through the Virtual File System (VFS), which provides
both an interface for user mode programs to interact with the file system (through system calls
such as open, read, chmod, stat, etc.) and an interface that file systems have to implement. This
allows multiple file systems to coexist, providing the same operations and semantics, with VFS
giving a single namespace view of all these file systems to the user.
File systems are mounted on different directories in this namespace. For example, on a typical
Linux system your hard drive may be mounted at the root, /, with directories such as /dev, /proc,
/sys, and /mnt/cdrom all mounting different file systems which may be on different devices.
Examples of file systems used on Linux include ext4, rfs, FAT, and others.
VFS implements the various system calls for file system operations by using a number of data
structures such as inodes, directory entries and files, and related callbacks that file systems must
implement.
Inodes
The inode is the central data structure used in VFS. It represents a file system object such as a
regular file, directory, symbolic link, etc. An inode contains information about the file type, size,
permissions, last modified time, and other attributes. For many common Linux disk file systems
such as ext4, the on-disk data structures used to represent file metadata directly correspond to the
inode structure used by the Linux kernel.
While an inode represents a file, it does not represent a file name. A single file may have
multiple names, or hard links, but only one inode.
File systems provide a lookup callback to VFS which is used to retrieve an inode for a particular
file, based on the parent inode and the child name. File systems must implement a number of
other inode operations such as chmod, stat, open, etc.
Directory entries
VFS uses a directory entry cache to represent your file system namespace. Directory entries only
exist in memory, and contain a pointer to the inode for the file. For example, if you have a path
like /home/user/foo, there is a directory entry for home, user, and foo, each with a pointer to an
inode. Directory entries are cached for fast lookup, but if an entry is not yet in the cache, the
inode lookup operation is used to retrieve the inode from the file system so a new directory entry
can be created.
File objects
When an inode is opened, a file object is created for that file which keeps track of things like the
file offset and whether the file was opened for read, write or both. File systems must provide a
number of file operations such as read, write, sync, etc.
File descriptors
Applications refer to file objects through file descriptors. These are numeric values, unique to a
process, that refer to any files the process has open. File descriptors can refer to other types of
objects that provide a file-like interface in Linux, including ttys, sockets, and pipes. Multiple file
descriptors can refer to the same file object, e.g. through use of the dup system call.
Special file types
Besides just regular files and directories, Linux supports a number of additional file types. These
include device files, FIFOs, sockets, and symbolic links.
Some of these files affect how paths are parsed. Symbolic links are special files that refer to a
different file or directory, and following them is handled seamlessly by VFS. If you open the
path /foo/bar/baz and bar is a symbolic link to /zed, then you will actually open /zed/baz instead.
Similarly, a directory may be used as a mount point for another file system. In this case, when a
path crosses this directory, all inode operations below the mount point go to the new file system.
Special and pseudo file systems
Linux uses a number of file systems that don’t read files from a disk. TmpFs is used as a
temporary, in-memory file system, whose contents will not be persisted. ProcFs and SysFs both
provide access to kernel information about processes, devices and drivers. These file systems do
not have a disk, network or other device associated with them, and instead are virtualized by the
kernel.
File systems on Windows
Windows generalizes all system resources into objects. These include not just files, but also
things like threads, shared memory sections, and timers, just to name a few. All requests to open
a file ultimately go through the Object Manager in the NT kernel, which routes the request
through the I/O Manager to the correct file system driver. The interface that file system drivers
implement in Windows is more generic and enforces fewer requirements. For example, there is
no common inode structure or anything similar, nor is there a directory entry; instead, file system
drivers such as ntfs.sys are responsible for resolving paths and opening file objects.
File systems in Windows are typically mounted on drive letters like C:, D:, etc., although they
can be mounted on directories in other file systems as well. These drive letters are actually a
construct of Win32, and not something that the Object Manager directly deals with. The Object
Manager keeps a namespace that looks similar to the Linux file system namespace, rooted in ,
with file system volumes represented by device objects with paths like
DeviceHarddiskVolume1.
When you open a file using a path like C:foobar, the Win32 CreateFile call translates this to
an NT path of the form DosDeviceC:foobar, where DosDeviceC: is actually a symbolic
link to, for example, DeviceHarddiskVolume4. Therefore, the real full path to the file is
actually DeviceHarddiskVolume4foobar. The object manager resolves each component of
the path, similar to how VFS would in Linux, until it encounters the device object. At this point,
it forwards the request to the I/O manager, which creates an I/O Request Packet (IRP) with the
remaining path, which it sends to the file system driver for the device.
File objects
When a file is opened, the object manager creates a file object for it. Instead of file descriptors,
the object manager provides handles to file objects. Handles can actually refer to any object
manager object, not just files.
When you call a system call like NtReadFile (typically through the Win32 ReadFile function),
the I/O manager again creates an IRP to send down to the file system driver for the file object to
perform the request.
Because there are no inodes or anything similar in NT, most operations on files in Windows
require a file object.
Reparse points
Windows only supports two file types: regular files and directories. Both files and directories
can be reparse points, which are special files that have a fixed header and a block of arbitrary
data. The header includes a tag that identifies the type of reparse point, which must be handled
by a file system filter driver, or for built-in reparse point types, the I/O manager itself.
Reparse points are used to implement symbolic links and mount points. In these cases, the tag
indicates that the reparse point is a symbolic link or mount, and the data associated with the
reparse point contains the link target, or volume name for mount points. Reparse points can also
be used for other functionality such as the placeholder files used by OneDrive in Windows 8.

Introduction One of the key goals for the Windows Subsystem for Li.pdf

  • 1.
    Introduction One of thekey goals for the Windows Subsystem for Linux is to allow users to work with their files as they would on Linux, while giving full interoperability with files the user already has on their Windows machine. Unlike a virtual machine, where you have to use network shares or other solutions to share files between the host and guest OS, WSL has direct access to all your Windows drives to allow for easy interop. Windows file systems differ substantially from Linux file systems, and this post looks into how WSL bridges those two worlds. File systems on Linux Linux abstracts file systems operations through the Virtual File System (VFS), which provides both an interface for user mode programs to interact with the file system (through system calls such as open, read, chmod, stat, etc.) and an interface that file systems have to implement. This allows multiple file systems to coexist, providing the same operations and semantics, with VFS giving a single namespace view of all these file systems to the user. File systems are mounted on different directories in this namespace. For example, on a typical Linux system your hard drive may be mounted at the root, /, with directories such as /dev, /proc, /sys, and /mnt/cdrom all mounting different file systems which may be on different devices. Examples of file systems used on Linux include ext4, rfs, FAT, and others. VFS implements the various system calls for file system operations by using a number of data structures such as inodes, directory entries and files, and related callbacks that file systems must implement. Inodes The inode is the central data structure used in VFS. It represents a file system object such as a regular file, directory, symbolic link, etc. An inode contains information about the file type, size, permissions, last modified time, and other attributes. For many common Linux disk file systems such as ext4, the on-disk data structures used to represent file metadata directly correspond to the inode structure used by the Linux kernel. While an inode represents a file, it does not represent a file name. A single file may have multiple names, or hard links, but only one inode. File systems provide a lookup callback to VFS which is used to retrieve an inode for a particular file, based on the parent inode and the child name. File systems must implement a number of other inode operations such as chmod, stat, open, etc. Directory entries VFS uses a directory entry cache to represent your file system namespace. Directory entries only exist in memory, and contain a pointer to the inode for the file. For example, if you have a path
  • 2.
    like /home/user/foo, thereis a directory entry for home, user, and foo, each with a pointer to an inode. Directory entries are cached for fast lookup, but if an entry is not yet in the cache, the inode lookup operation is used to retrieve the inode from the file system so a new directory entry can be created. File objects When an inode is opened, a file object is created for that file which keeps track of things like the file offset and whether the file was opened for read, write or both. File systems must provide a number of file operations such as read, write, sync, etc. File descriptors Applications refer to file objects through file descriptors. These are numeric values, unique to a process, that refer to any files the process has open. File descriptors can refer to other types of objects that provide a file-like interface in Linux, including ttys, sockets, and pipes. Multiple file descriptors can refer to the same file object, e.g. through use of the dup system call. Special file types Besides just regular files and directories, Linux supports a number of additional file types. These include device files, FIFOs, sockets, and symbolic links. Some of these files affect how paths are parsed. Symbolic links are special files that refer to a different file or directory, and following them is handled seamlessly by VFS. If you open the path /foo/bar/baz and bar is a symbolic link to /zed, then you will actually open /zed/baz instead. Similarly, a directory may be used as a mount point for another file system. In this case, when a path crosses this directory, all inode operations below the mount point go to the new file system. Special and pseudo file systems Linux uses a number of file systems that don’t read files from a disk. TmpFs is used as a temporary, in-memory file system, whose contents will not be persisted. ProcFs and SysFs both provide access to kernel information about processes, devices and drivers. These file systems do not have a disk, network or other device associated with them, and instead are virtualized by the kernel. File systems on Windows Windows generalizes all system resources into objects. These include not just files, but also things like threads, shared memory sections, and timers, just to name a few. All requests to open a file ultimately go through the Object Manager in the NT kernel, which routes the request through the I/O Manager to the correct file system driver. The interface that file system drivers implement in Windows is more generic and enforces fewer requirements. For example, there is no common inode structure or anything similar, nor is there a directory entry; instead, file system drivers such as ntfs.sys are responsible for resolving paths and opening file objects. File systems in Windows are typically mounted on drive letters like C:, D:, etc., although they
  • 3.
    can be mountedon directories in other file systems as well. These drive letters are actually a construct of Win32, and not something that the Object Manager directly deals with. The Object Manager keeps a namespace that looks similar to the Linux file system namespace, rooted in , with file system volumes represented by device objects with paths like DeviceHarddiskVolume1. When you open a file using a path like C:foobar, the Win32 CreateFile call translates this to an NT path of the form DosDeviceC:foobar, where DosDeviceC: is actually a symbolic link to, for example, DeviceHarddiskVolume4. Therefore, the real full path to the file is actually DeviceHarddiskVolume4foobar. The object manager resolves each component of the path, similar to how VFS would in Linux, until it encounters the device object. At this point, it forwards the request to the I/O manager, which creates an I/O Request Packet (IRP) with the remaining path, which it sends to the file system driver for the device. File objects When a file is opened, the object manager creates a file object for it. Instead of file descriptors, the object manager provides handles to file objects. Handles can actually refer to any object manager object, not just files. When you call a system call like NtReadFile (typically through the Win32 ReadFile function), the I/O manager again creates an IRP to send down to the file system driver for the file object to perform the request. Because there are no inodes or anything similar in NT, most operations on files in Windows require a file object. Reparse points Windows only supports two file types: regular files and directories. Both files and directories can be reparse points, which are special files that have a fixed header and a block of arbitrary data. The header includes a tag that identifies the type of reparse point, which must be handled by a file system filter driver, or for built-in reparse point types, the I/O manager itself. Reparse points are used to implement symbolic links and mount points. In these cases, the tag indicates that the reparse point is a symbolic link or mount, and the data associated with the reparse point contains the link target, or volume name for mount points. Reparse points can also be used for other functionality such as the placeholder files used by OneDrive in Windows 8. Solution Introduction One of the key goals for the Windows Subsystem for Linux is to allow users to work with their files as they would on Linux, while giving full interoperability with files the user already has on
  • 4.
    their Windows machine.Unlike a virtual machine, where you have to use network shares or other solutions to share files between the host and guest OS, WSL has direct access to all your Windows drives to allow for easy interop. Windows file systems differ substantially from Linux file systems, and this post looks into how WSL bridges those two worlds. File systems on Linux Linux abstracts file systems operations through the Virtual File System (VFS), which provides both an interface for user mode programs to interact with the file system (through system calls such as open, read, chmod, stat, etc.) and an interface that file systems have to implement. This allows multiple file systems to coexist, providing the same operations and semantics, with VFS giving a single namespace view of all these file systems to the user. File systems are mounted on different directories in this namespace. For example, on a typical Linux system your hard drive may be mounted at the root, /, with directories such as /dev, /proc, /sys, and /mnt/cdrom all mounting different file systems which may be on different devices. Examples of file systems used on Linux include ext4, rfs, FAT, and others. VFS implements the various system calls for file system operations by using a number of data structures such as inodes, directory entries and files, and related callbacks that file systems must implement. Inodes The inode is the central data structure used in VFS. It represents a file system object such as a regular file, directory, symbolic link, etc. An inode contains information about the file type, size, permissions, last modified time, and other attributes. For many common Linux disk file systems such as ext4, the on-disk data structures used to represent file metadata directly correspond to the inode structure used by the Linux kernel. While an inode represents a file, it does not represent a file name. A single file may have multiple names, or hard links, but only one inode. File systems provide a lookup callback to VFS which is used to retrieve an inode for a particular file, based on the parent inode and the child name. File systems must implement a number of other inode operations such as chmod, stat, open, etc. Directory entries VFS uses a directory entry cache to represent your file system namespace. Directory entries only exist in memory, and contain a pointer to the inode for the file. For example, if you have a path like /home/user/foo, there is a directory entry for home, user, and foo, each with a pointer to an inode. Directory entries are cached for fast lookup, but if an entry is not yet in the cache, the inode lookup operation is used to retrieve the inode from the file system so a new directory entry can be created.
  • 5.
    File objects When aninode is opened, a file object is created for that file which keeps track of things like the file offset and whether the file was opened for read, write or both. File systems must provide a number of file operations such as read, write, sync, etc. File descriptors Applications refer to file objects through file descriptors. These are numeric values, unique to a process, that refer to any files the process has open. File descriptors can refer to other types of objects that provide a file-like interface in Linux, including ttys, sockets, and pipes. Multiple file descriptors can refer to the same file object, e.g. through use of the dup system call. Special file types Besides just regular files and directories, Linux supports a number of additional file types. These include device files, FIFOs, sockets, and symbolic links. Some of these files affect how paths are parsed. Symbolic links are special files that refer to a different file or directory, and following them is handled seamlessly by VFS. If you open the path /foo/bar/baz and bar is a symbolic link to /zed, then you will actually open /zed/baz instead. Similarly, a directory may be used as a mount point for another file system. In this case, when a path crosses this directory, all inode operations below the mount point go to the new file system. Special and pseudo file systems Linux uses a number of file systems that don’t read files from a disk. TmpFs is used as a temporary, in-memory file system, whose contents will not be persisted. ProcFs and SysFs both provide access to kernel information about processes, devices and drivers. These file systems do not have a disk, network or other device associated with them, and instead are virtualized by the kernel. File systems on Windows Windows generalizes all system resources into objects. These include not just files, but also things like threads, shared memory sections, and timers, just to name a few. All requests to open a file ultimately go through the Object Manager in the NT kernel, which routes the request through the I/O Manager to the correct file system driver. The interface that file system drivers implement in Windows is more generic and enforces fewer requirements. For example, there is no common inode structure or anything similar, nor is there a directory entry; instead, file system drivers such as ntfs.sys are responsible for resolving paths and opening file objects. File systems in Windows are typically mounted on drive letters like C:, D:, etc., although they can be mounted on directories in other file systems as well. These drive letters are actually a construct of Win32, and not something that the Object Manager directly deals with. The Object Manager keeps a namespace that looks similar to the Linux file system namespace, rooted in , with file system volumes represented by device objects with paths like
  • 6.
    DeviceHarddiskVolume1. When you opena file using a path like C:foobar, the Win32 CreateFile call translates this to an NT path of the form DosDeviceC:foobar, where DosDeviceC: is actually a symbolic link to, for example, DeviceHarddiskVolume4. Therefore, the real full path to the file is actually DeviceHarddiskVolume4foobar. The object manager resolves each component of the path, similar to how VFS would in Linux, until it encounters the device object. At this point, it forwards the request to the I/O manager, which creates an I/O Request Packet (IRP) with the remaining path, which it sends to the file system driver for the device. File objects When a file is opened, the object manager creates a file object for it. Instead of file descriptors, the object manager provides handles to file objects. Handles can actually refer to any object manager object, not just files. When you call a system call like NtReadFile (typically through the Win32 ReadFile function), the I/O manager again creates an IRP to send down to the file system driver for the file object to perform the request. Because there are no inodes or anything similar in NT, most operations on files in Windows require a file object. Reparse points Windows only supports two file types: regular files and directories. Both files and directories can be reparse points, which are special files that have a fixed header and a block of arbitrary data. The header includes a tag that identifies the type of reparse point, which must be handled by a file system filter driver, or for built-in reparse point types, the I/O manager itself. Reparse points are used to implement symbolic links and mount points. In these cases, the tag indicates that the reparse point is a symbolic link or mount, and the data associated with the reparse point contains the link target, or volume name for mount points. Reparse points can also be used for other functionality such as the placeholder files used by OneDrive in Windows 8.