Writing file system
in CPython
Dmitry Alimov
2018
File System
Application
Virtual File System (VFS)
Device Driver
Device (HDD, SSD, CD-ROM, NIC, etc)
User Space
Kernel Space Syscall Interface
Block Layer
Page CacheDirect I/O
Hardware
ext2, ext3,
ext4, btrfs
NFS,
smbfs
FUSE
procfs,
tmpfs
FUSE (File System In Userspace)
Available for Linux, FreeBSD, OpenBSD, NetBSD (as puffs), OpenSolaris, Minix 3,
Android and macOS [2]
In Linux kernel since version 2.6.14
Windows compatibility is provided by libraries and ports [3, 4, 5]
FUSE
Diagram showing how FUSE works [2]
Example uses
GlusterFS: Clustered Distributed Filesystem
GmailFS: Filesystem which stores data as mail in Gmail
SSHFS: Provides access to a remote filesystem through SSH
WikipediaFS: View and edit Wikipedia articles as if they were real files
πfs: A file system that stores all files in the digits of Pi
struct fuse_operations {
int (*getattr) (const char *path, struct stat *stbuf);
...
int (*readdir) (const char *path, void *buf,
fuse_fill_dir_t filler, off_t offset,
struct fuse_file_info *fi);
...
int (*read) (const char *path, char *buf, size_t size,
off_t offset, struct fuse_file_info *fi);
...
};
libfuse API
Python interface to FUSE
fusepy module [7] — simple interface to FUSE and MacFUSE:
def getattr(self, path, fh=None):
if path != '/':
raise FuseOSError(errno.ENOENT)
return {'st_mode': (S_IFDIR | 0o755), 'st_nlink': 2}
def readdir(self, path, fh=None):
return ['.', '..']
def read(self, path, size, offset, fh=None):
return self.data[path][offset:offset + size]
PEPFS
Read-only file system with PEPs as the files [8]
Uses github repository with PEPs to get the current PEPs [9]
Implemented in Python and uses the fusepy module
Lazy PEP files' read (download specific PEP on demand)
PEPFS mount
$ ./pepfs.py /tmp/pepfs/
$ mount
...
PEPFS on /tmp/pepfs type fuse
(rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
...
PEPFS example
$ ls -la /tmp/pepfs/
total 4
drwxr-xr-x 2 root root 6943251 Jul 25 00:07 .
drwxrwxrwt 35 root root 4096 Jul 25 00:07 ..
-rw-r--r-- 1 root root 29582 Jul 25 00:07 pep-0001.txt
-rw-r--r-- 1 root root 8214 Jul 25 00:07 pep-0002.txt
-rw-r--r-- 1 root root 2229 Jul 25 00:07 pep-0003.txt
...
-rw-r--r-- 1 root root 81947 Jul 25 00:07 pep-3333.txt
Page Cache
To enable Page Cache, you need to set the flag keep_cache, in open() method:
def open(self, path, flags):
flags.keep_cache = 1
return 0
And also set raw_fi to True in FUSE(PEPFS(), ..., raw_fi=True)
NB: Invalidation of cache and updating of data occurs only if the file size changes
Questions
https://t.me/spbpython
https://t.me/piterpy_meetup
References:
1. https://en.wikibooks.org/wiki/The_Linux_Kernel/Storage
2. https://en.wikipedia.org/wiki/Filesystem_in_Userspace
3. https://en.wikipedia.org/wiki/Dokan_Library
4. https://github.com/crossmeta/cxfuse
5. http://www.secfs.net/winfsp/
6. https://github.com/libfuse/libfuse
7. https://github.com/fusepy/fusepy
8. https://github.com/delimitry/pepfs
9. https://github.com/python/peps/

Writing file system in CPython

  • 1.
    Writing file system inCPython Dmitry Alimov 2018
  • 2.
    File System Application Virtual FileSystem (VFS) Device Driver Device (HDD, SSD, CD-ROM, NIC, etc) User Space Kernel Space Syscall Interface Block Layer Page CacheDirect I/O Hardware ext2, ext3, ext4, btrfs NFS, smbfs FUSE procfs, tmpfs
  • 3.
    FUSE (File SystemIn Userspace) Available for Linux, FreeBSD, OpenBSD, NetBSD (as puffs), OpenSolaris, Minix 3, Android and macOS [2] In Linux kernel since version 2.6.14 Windows compatibility is provided by libraries and ports [3, 4, 5]
  • 4.
  • 5.
    Example uses GlusterFS: ClusteredDistributed Filesystem GmailFS: Filesystem which stores data as mail in Gmail SSHFS: Provides access to a remote filesystem through SSH WikipediaFS: View and edit Wikipedia articles as if they were real files πfs: A file system that stores all files in the digits of Pi
  • 6.
    struct fuse_operations { int(*getattr) (const char *path, struct stat *stbuf); ... int (*readdir) (const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi); ... int (*read) (const char *path, char *buf, size_t size, off_t offset, struct fuse_file_info *fi); ... }; libfuse API
  • 7.
    Python interface toFUSE fusepy module [7] — simple interface to FUSE and MacFUSE: def getattr(self, path, fh=None): if path != '/': raise FuseOSError(errno.ENOENT) return {'st_mode': (S_IFDIR | 0o755), 'st_nlink': 2} def readdir(self, path, fh=None): return ['.', '..'] def read(self, path, size, offset, fh=None): return self.data[path][offset:offset + size]
  • 8.
    PEPFS Read-only file systemwith PEPs as the files [8] Uses github repository with PEPs to get the current PEPs [9] Implemented in Python and uses the fusepy module Lazy PEP files' read (download specific PEP on demand)
  • 9.
    PEPFS mount $ ./pepfs.py/tmp/pepfs/ $ mount ... PEPFS on /tmp/pepfs type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000) ...
  • 10.
    PEPFS example $ ls-la /tmp/pepfs/ total 4 drwxr-xr-x 2 root root 6943251 Jul 25 00:07 . drwxrwxrwt 35 root root 4096 Jul 25 00:07 .. -rw-r--r-- 1 root root 29582 Jul 25 00:07 pep-0001.txt -rw-r--r-- 1 root root 8214 Jul 25 00:07 pep-0002.txt -rw-r--r-- 1 root root 2229 Jul 25 00:07 pep-0003.txt ... -rw-r--r-- 1 root root 81947 Jul 25 00:07 pep-3333.txt
  • 11.
    Page Cache To enablePage Cache, you need to set the flag keep_cache, in open() method: def open(self, path, flags): flags.keep_cache = 1 return 0 And also set raw_fi to True in FUSE(PEPFS(), ..., raw_fi=True) NB: Invalidation of cache and updating of data occurs only if the file size changes
  • 12.
  • 13.
    References: 1. https://en.wikibooks.org/wiki/The_Linux_Kernel/Storage 2. https://en.wikipedia.org/wiki/Filesystem_in_Userspace 3.https://en.wikipedia.org/wiki/Dokan_Library 4. https://github.com/crossmeta/cxfuse 5. http://www.secfs.net/winfsp/ 6. https://github.com/libfuse/libfuse 7. https://github.com/fusepy/fusepy 8. https://github.com/delimitry/pepfs 9. https://github.com/python/peps/