Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

All'ombra del Leviatano: Filesystem in Userspace


Published on

My talk at the Linux Day 2015 in Rome (in Italian).

Published in: Software
  • Be the first to comment

  • Be the first to like this

All'ombra del Leviatano: Filesystem in Userspace

  1. 1. Sabato 24 Ottobre 2015 #LDROMA15
  2. 2. All'ombra del Leviatano
  3. 3.
  4. 4. File e filesystem
  5. 5. Die Dataien hat der liebe Gott gemacht, alles andere ist Menschenwerk. Source: Leopold Kronecker (apocrifo)
  6. 6. file is the new byte
  7. 7. All files are created equal. Source: Anonimo
  8. 8. Everything is a file. Source: Anonimo
  9. 9. All file systems are not created equal. Source: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)
  10. 10. Fondazioni
  11. 11. For most users, the filesystem is the most visible aspect of an operating system. Source: Silberschatz & Galvin, Operating System Concepts, 7th ed.
  12. 12. The filesystem consists of two distinct parts: a collection of files, each storing related data, and a directory structure, which organizes and provides information about all the files in the system. Source: Silberschatz & Galvin, Operating System Concepts, 7th ed.
  13. 13. The most important job of UNIX is to provide a filesystem. Source: Ritchie & Thompson, The UNIX TimeSharing System
  14. 14. A file contains whatever information the user places on it, for example symbolic or binary (object) programs. No particular structuring is expected by the system. Source: Ritchie & Thompson, The UNIX TimeSharing System
  15. 15. A file does not exist within a particular directory; the directory entry for a file consists merely of its name and a pointer to the information actually describing the file. Source: Ritchie & Thompson, The UNIX TimeSharing System
  16. 16. There is a threefold advantage in treating I/O devices this way: file and device I/O are as similar as possible; file and device names have the same syntax and meaning, so that a program expecting a file name as a parameter can be passed a device name; finally, special files are subject to the same protection mechanism as regular files. Source: Ritchie & Thompson, The UNIX TimeSharing System
  17. 17. Perhaps paradoxically, the success of UNIX is largely due to the fact that it was not designed to meet any predefined objectives. Source: Ritchie & Thompson, The UNIX TimeSharing System
  18. 18. Precisazioni
  19. 19. The whole point with "everything is a file" is not that you have some random filename, but the fact that you can use common tools to operate on different things. Source: Linux Torvalds, 8 giugno 2002
  20. 20. The UNIX philosophy is often quoted as "everything is a file", but that really means everything is a stream of bytes. Source: Linux Torvalds, 8 marzo 2007
  21. 21. It should be just a "read()", and then people can use general libraries and treat all sources the same. Source: Linux Torvalds, 8 marzo 2007
  22. 22. Il paradosso
  23. 23. everything is a file ma il perimetro di cosa � un file non � flessible ad libitum
  24. 24. il perimetro di cosa � un file � fissato dal kernel
  25. 25. la sintassi e la semantica del filesystem sono fissate dal kernel
  26. 26. Il Leviatano
  27. 27. VFS: oltre 65 mila righe di codice gi� nel 2008 Source: Galloway et al., Model-Checking the Linux Virtual File System
  28. 28. approccio conservativo kernel-centrico
  29. 29. debug difficile
  30. 30. l'utente non amministratore semplicemente non pu�
  31. 31. ecce spes eius frustrabitur eum et videntibus cunctis praecipitabitur Source: Iob, 41, 1
  32. 32. VFS
  33. 33. Astrazione File: modello comune Strutture dati: superblock, inode, file, dentry Operazioni Object-oriented
  34. 34. Implementazione Disk data structures Memory data structures Disk space management
  35. 35. Precursori Earlier VFS implementations include Sun's VFS (in SunOS version 2.0, circa 1985) and IBM and Microsoft's "Installable File System" for IBM OS/2. Source: M. Tim Jones, Anatomy of the Linux virtual filesystem switch
  36. 36. Altre strade
  37. 37. Synthetic Files
  38. 38. 9P: Plan 9 Filesystem Protocol
  39. 39. puffs: Pass-to-Userspace Framework File System su NetBSD
  40. 40. A filesystem is a protocol translator: it interprets incoming requests and transforms them into a form suitable to store and retrieve data. Source: Antti Kantee, Send and Receive of File System Protocols
  41. 41. Hurd translators
  42. 42. A translator is simply a normal program acting as an object server and participating in the Hurd's distributed virtual filesystem. Source:
  43. 43. It is so-called because it typically exports a filesystem (although need not: cf. auth, proc and pfinet) and thus translates object invocations into calls appropriate for the backing store (e.g., ext2 filesystem, nfs server, etc.). Source:
  44. 44. Another way of putting it is that it translates from one representation of a data structure into another representation, for example from the on-disk ext2 data layout to a traditional filesystem hierarchy, or from a XML file to a virtual hierarchical manifestation. Source:
  45. 45. A translator is usually registered with a specific filesystem node by using the settrans command. Source:
  46. 46. Translators do not require any special privilege to run. The privilege they require is simply that to access the indiviudal resources they use. Source:
  47. 47. FUSE Filesystem in Userspace
  48. 48. With FUSE it is possible to implement a fully functional filesystem in a userspace program. Source:
  49. 49. Autore Miklos Szeredi Licenze GPL + LGPL
  50. 50. Features include...
  51. 51. simple library API
  52. 52. simple installation (no need to patch or recompile the kernel)
  53. 53. secure implementation
  54. 54. userspace-kernel interface is very efficient
  55. 55. usable by non privileged users Source:
  56. 56. Interazione attraverso un file (ancora!): /dev/fuse.
  57. 57. FUSE is a userspace filesystem framework. It consists of a kernel module (fuse.ko), a userspace library (libfuse.*) and a mount utility (fusermount). Source:
  58. 58. One of the most important features of FUSE is allowing secure, non-privileged mounts. This opens up new possibilities for the use of filesystems. A good example is sshfs: a secure network filesystem using the sftp protocol. Source:
  59. 59. Since the mount() system call is a privileged operation, a helper program (fusermount) is needed, which is installed setuid root. Source:
  60. 60. Vocabolario
  61. 61. Userspace filesystem A filesystem in which data and metadata are provided by an ordinary userspace process. The filesystem can be accessed normally through the kernel interface. Source:
  62. 62. Filesystem daemon The process(es) providing the data and metadata of the filesystem. Source:
  63. 63. Non-privileged mount (or user mount) A userspace filesystem mounted by a non-privileged (non- root) user. The filesystem daemon is running with the privileges of the mounting user. Source:
  64. 64. Filesystem connection A connection between the filesystem daemon and the kernel. The connection exists until either the daemon dies, or the filesystem is umounted. Source:
  65. 65. Mount owner The user who does the mounting. Source:
  66. 66. User The user who is performing filesystem operations. Source:
  67. 67. hello.c
  68. 68. /* FUSE: Filesystem in Userspace Copyright (C) 2001-2007 Miklos Szeredi <> This program can be distributed under the terms of the GNU GPL. See the file COPYING. */ #define FUSE_USE_VERSION 30 #include <fuse.h> #include <stdio.h> #include <string.h> #include <errno.h> #include <fcntl.h> static const char *hello_str = "Hello World!n"; static const char *hello_path = "/hello";
  69. 69. static int hello_getattr(const char *path, struct stat *stbuf) { int res = 0; memset(stbuf, 0, sizeof(struct stat)); if (strcmp(path, "/") == 0) { stbuf->st_mode = S_IFDIR | 0755; stbuf->st_nlink = 2; } else if (strcmp(path, hello_path) == 0) { stbuf->st_mode = S_IFREG | 0444; stbuf->st_nlink = 1; stbuf->st_size = strlen(hello_str); } else res = -ENOENT; return res; }
  70. 70. static int hello_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi) { (void) offset; (void) fi; if (strcmp(path, "/") != 0) return -ENOENT; filler(buf, ".", NULL, 0); filler(buf, "..", NULL, 0); filler(buf, hello_path + 1, NULL, 0); return 0; }
  71. 71. static int hello_open(const char *path, struct fuse_file_info *fi) { if (strcmp(path, hello_path) != 0) return -ENOENT; if ((fi->flags & 3) != O_RDONLY) return -EACCES; return 0; }
  72. 72. static int hello_read(const char *path, char *buf, size_t size, off_t offset, struct fuse_file_info *fi) { size_t len; (void) fi; if(strcmp(path, hello_path) != 0) return -ENOENT; len = strlen(hello_str); if (offset < len) { if (offset + size > len) size = len - offset; memcpy(buf, hello_str + offset, size); } else size = 0; return size; }
  73. 73. static struct fuse_operations hello_oper = { .getattr = hello_getattr, .readdir = hello_readdir, .open = hello_open, .read = hello_read, };
  74. 74. int main(int argc, char *argv[]) { return fuse_main(argc, argv, &hello_oper, NULL); }
  75. 75. Bestiario
  76. 76. CephFS FUSE come vivaio, come coltura di giovani filesystem. CephFS � nel kernel dalla versione 2.6.34.
  77. 77. Couchfuse Couchfuse is a FUSE filesystem that exposes Couchdb databases as filesystem folder. Source:
  78. 78. elfs A simple (FUSE) filesystem on top of ELF objects. Autore: Guillaume Leconte Source:
  79. 79. $ elfs `which fdup` /tmp/elf $ ls -l /tmp/elf/ total 0 drw-r--r-- 1 root root 0 Jan 1 1970 header drw-r--r-- 1 root root 0 Jan 1 1970 libs drw-r--r-- 1 root root 0 Jan 1 1970 sections
  80. 80. estensione ad altri formati binari astrazione dal formato interfaccia verso exec()
  81. 81. etcd-fs A replicated filesystem on top of etcd. Autore: Jonathan Leibiusky Source:
  82. 82. fusepy Simple ctypes bindings for FUSE. Autore: Terence Honles Source:
  83. 83. GlusterFS GlusterFS is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. Source:
  84. 84. PNGdrive PNG meets Steganography meets Fuse: the easiest way to have plausible deniability. Source:
  85. 85. WikipediaFS WikipediaFS is a virtual filesystem which allows users to view and edit Wikipedia articles as if they were real files on a local disk drive. Source:
  86. 86. Colof�ne Presentazione composta con vim e Hovercraft! su Ubuntu Saucy. Featuring Google Fonts: Libre Baskerville, Racing Sans One, Satisfy.
  87. 87. exit()
  88. 88. Roberto Reale