Python Fuse

9,925 views

Published on

Python FUSE, write file-systems in few lines of code.

Published in: Technology

Python Fuse

  1. 1. Python FUSE File-System in USErspace Beyond the Traditional File-SystemsMatteo Bertozzi (Th30z) http://th30z.blogspot.com
  2. 2. Talk OverviewWhat is a File-SystemBrief File-Systems HistoryWhat is FUSEBeyond the Traditional File-SystemAPI OverviewExamples (Finally some code!!!)http://mbertozzi.develer.com/python-fuseQ&A
  3. 3. What is a File-SystemIs a Method of storing and organizing data to make it easy to find and access....to interact with an object You name it, and you say what you want it do. The Filesystem takes the name you give Looks through disk to find the object Gives the object your request to do something.
  4. 4. What is a File-SystemOn Disk Format (...serialized struct)ext2, ext3, reiserfs, btrfs...Namespace(Mapping between name and content)/home/th30z/, /usr/local/share/test.c, ...Runtime Service: open(), read(), write(), ...
  5. 5. (The Origins) ...A bit of HistoryMultics 1965 (File-System Paper)A General-Purpose File System For Secondary StorageUnix Late 1969 User Program User Space Kernel Space System Call Layer The File-System Only One File-System
  6. 6. (The Evolution) ...A bit of HistoryMultics 1965 (File-System Paper)A General-Purpose File System For Secondary StorageUnix Late 1969 User Program User Space Kernel Space System Call Layer (which?) The File-System 1 The File-System 2
  7. 7. (The Evolution) ...A bit of HistoryMultics 1965 (File-System Paper)A General-Purpose File System For Secondary StorageUnix Late 1969 User Program User Space Kernel Space System Call Layer (which?) FS 1 FS 2 FS 3 FS 4 ... FS N
  8. 8. (The Solution) ...A bit of HistoryMultics 1965 (File-System Paper)A General-Purpose File System For Secondary StorageUnix Late 1969Sun Microsystem 1984 User Program User Space Kernel Space System Call Layer Vnode/VFS Layer FS 1 FS 2 FS 3 FS 4 ... FS N
  9. 9. Virtual File-SystemProvides an abstraction within thekernel which allows different C Libraryfilesystem implementations to coexist. (open(), read(), write(), ...) User SpaceProvides the filesystem interface to Kernel Spaceuserspace programs. System Calls (sys_open(), sys_read(), ...) VFS Concepts VFS (vfs_read(), vfs_write(), ...)A super-block object represents afilesystem. Kernel ext2 ReiserFS XFS SupportedI-Nodes are filesystem objects such File-Systems ext3 Reiser4 JFSas regular files, directories, FIFOs, ... ext4 Btrfs HFS+A file object represents a file opened ... ... ...by a process.
  10. 10. Wow, It seems not muchdifficult writing a filesystem
  11. 11. Why File-System are ComplexYou need to know the Kernel (No helper libraries: Qt, Glib, ...)Reduce Disk Seeks / SSD Block limited write cyclesBe consistent, Power Down, Unfinished Write...(Journal, Soft-Updates, Copy-on-Write, ...)Bad Blocks, Disk ErrorDont waste to much space for MetadataExtra Features: Deduplication, Compression,Cryptography, Snapshots…
  12. 12. File-Systems Lines of Code MinixFS 2,000MinixFS Fuse 800 UFS 2,000 UFS Fuse 1,000 FAT 6,000 ext2 8,000 ext3 16,000 ReiserFS 27,000 ext4 30,000 btrfs 50,000 NFS 10,000 8,000 Fuse 7,000 9,000 FtpFS 800 SshFS 2,000 Kernel Space User Space
  13. 13. Building a File-System is DifficultWriting good code is not easy (Bugs, Typo, ...)Writing good code in Kernel Space is muchmore difficult.Too many reboots during the developmentToo many Kernel Panic during RebootWe need more flexibility and Speedups!
  14. 14. FUSE, develop your file-systemwith your favorite language and library in user space
  15. 15. What is FUSEKernel module! (like ext2, ReiserFS, XFS, ...)Allows non-privileged user to create their own file-system without editing the kernel code. (User Space)FUSE is particularly useful for writing "virtual filesystems", that act as a view or translation of anexisting file-system storage device. (Facilitate Disk-Based, Network-Based and Pseudo File-System)Bindings: Python, Objective-C, Ruby, Java, C#, ...
  16. 16. File-Systems in User Space? ...Make File Systems Development Super EasyAll UserSpace Libraries are Available...Debugging ToolsNo Kernel RecompilationNo Machine Reboot!...File-System upgrade/fix2 sec downtime, app restart!
  17. 17. Yeah, ...but what’s FUSE?It’s a File-System with user-space callbacks ntfs-3g sshfs ifuse ChrionFS zfs-fuse YouTubeFS gnome-vfs2 ftpfs cryptoFS RaleighFS U n i x
  18. 18. FUSE Kernel Space and User Space Your FS The FUSE kernel module SshFS lib Fuse and the FUSE library FtpFS communicate via a ... User Space special file descriptor /dev/fuse Kernel Space which is obtained by FUSE opening /dev/fuse ext2 ... VFS ext4 drivers Your Fuse FS ... firmware User Input kernel Btrfs ls -l /myfuse/ lib FUSEKernel VFS FUSE
  19. 19. ...be creativeBeyond the Traditional File-SystemsImapFS: Access to yourmail with grep. Thousand of tools availableSocialFS: Log all your cat/grep/sedsocial network to collectnews/jokes and othersocial things. open() is the most usedYouTubeFS: Watch YouTube function in ourvideo as on your disk. applicationsGMailFS: Use your mailboxas backup disk.
  20. 20. FUSE API Overviewcreate(path, mode) mkdir(path, mode)truncate(path, size) unlink(path)mknod(path, mode, dev) readdir(path)open(path, mode) rmdir(path)write(path, data, offset) rename(opath, npath)read(path, length, offset) link(srcpath, dstpath)release(path) Your Fuse FSfsync(path) User Input ls -l /myfuse/ lib FUSEchmod(path, mode)chown(path, oid, gid) Kernel VFS FUSE
  21. 21. (File Operations) FUSE API Overview Reading Appendingcat /myfuse/test.txt Writing echo World >> /myfuse/test2.txt Truncating getattr() echo Hello > /myfuse/test2.txt getattr() echo Woo > /myfuse/test2.txt getattr() getattr() open() open() create() truncate() read() write() write() open() read() flush() flush() write() release() release() release() flush() release() Removing getattr() unlink() rm /myfuse/test.txt
  22. 22. (Directory Operations) FUSE API Overview Creating Removing mkdir /myfuse/folder Reading rmdir /myfuse/folder getattr() ls /myfuse/folder/ getattr() getattr() mkdir() rmdir() opendir() readdir() releasedir() Other Methods (getattr() is always called) chown th30z:develer /myfuse/test.txt getattr() -> chown() chmod 755 /myfuse/test.txt getattr() -> chmod()ln -s /myfuse/test.txt /myfuse/test-link.txt getattr() -> symlink()mv /myfuse/folder /myfuse/fancy-folder getattr() -> rename()
  23. 23. First Code Example! HTFS (HashTable File-System)
  24. 24. HTFS Overview FS Item/Object Traditional Filesystem Time of last access Metadata Object with Metadata Time of last modification Time of last status change (mode, uid, gid, ...) Protection and file-type (mode) User ID of owner (UID) HashTable (dict) keys are Group ID of owner (GID) paths values are Items. Extended Attributes (Key/Value) Data Item 1Path 1Path 2 Item can be a Regular File orPath 3 Item 2 Directory or FIFO...Path 4 Item 3 Data is raw data or filenamePath 5 Item 4 list if item is a directory.(Disk - Storage HashTable)
  25. 25. HTFS Itemclass Item(object): def __init__(self, mode, uid, gid): # ----------------------------------- Metadata -- self.atime = time.time() # time of last acces self.mtime = self.atime # time of last modification self.ctime = self.atime # time of last status change self.mode = mode # protection and file-type self.uid = uid # user ID of owner self.gid = gid # group ID of owner # Extended Attributes self.xattr = {} # --- Data ----------- This is a File! if stat.S_ISDIR(mode): we’ve metadata self.data = set() else: data and even xattr self.data =
  26. 26. (Data Helper) HTFS Itemdef read(self, offset, length): return self.data[offset:offset+length]def write(self, offset, data): length = len(data) self.data = self.data[:offset] + data + self.data[offset+length:] return lengthdef truncate(self, length): if len(self.data) > length: self.data = self.data[:length] else: self.data += x00 * (length - len(self.data)) ...a couple of utility methods to read/write and interact with data.
  27. 27. HTFS Fuse Operationsclass HTFS(fuse.Fuse): def __init__(self, *args, **kwargs): getattr() is called before fuse.Fuse.__init__(self, *args, **kwargs) any operation. Tells to the VFS if you can access self.uid = os.getuid() self.gid = os.getgid() to the specified file and the “State”. root_dir = Item(0755 | stat.S_IFDIR, self.uid, self.gid) self._storage = {/: root_dir} def getattr(self, path): if not path in self._storage: return -errno.ENOENTFile-System must be initialized with # Lookup Item and fill the stat struct the / directory item = self._storage[path] st = zstat(fuse.Stat()) def main(): st.st_mode = item.mode st.st_uid = item.uid server = HTFS() st.st_gid = item.gid server.main() st.st_atime = item.atime st.st_mtime = item.mtime Your FUSE File-System st.st_ctime = item.ctime st.st_size = len(item.data) is like a Server... return st
  28. 28. (File Operations) HTFS Fuse Operationsdef create(self, path, flags, mode): self._storage[path] = Item(mode | stat.S_IFREG, self.uid, self.gid) self._add_to_parent_dir(path)def truncate(self, path, len): self._storage[path].truncate(len)def read(self, path, size, offset): return self._storage[path].read(offset, size)def write(self, path, buf, offset): return self._storage[path].write(offset, buf) def unlink(self, path): self._remove_from_parent_dir(path)Disk is just a big dictionary... del self._storage[path] ...and files are items def rename(self, oldpath, newpath): key = name item = self._storage.pop(oldpath) self._storage[newpath] = item value = data
  29. 29. (Directory Operations) HTFS Fuse Operationsdef mkdir(self, path, mode): self._storage[path] = Item(mode | stat.S_IFDIR, self.uid, self.gid) self._add_to_parent_dir(path)def rmdir(self, path): self._remove_from_parent_dir(path) del self._storage[path] Directory is a File that containsdef readdir(self, path, offset): dir_items = self._storage[path].data File names for item in dir_items: yield fuse.Direntry(item) as data!def _add_to_parent_dir(self, path): parent_path = os.path.dirname(path) filename = os.path.basename(path) self._storage[parent_path].data.add(filename)
  30. 30. (XAttr Operations) HTFS Fuse Operationsdef setxattr(self, path, name, value, flags): self._storage[path].xattr[name] = valuedef getxattr(self, path, name, size): value = self._storage[path].xattr.get(name, ) if size == 0: # We are asked for size of the value return len(value) return value Extended attributes extenddef listxattr(self, path, size): the basic attributes attrs = self._storage[path].xattr.keys() associated with files and if size == 0: directories in the file system. return len(attrs) + len(.join(attrs)) They are stored as name:data return attrs pairs associated with filedef removexattr(self, path, name): system objects if name in self._storage[path].xattr: del self._storage[path].xattr[name]
  31. 31. (Other Operations) HTFS Fuse Operations Lookup Item, def chmod(self, path, mode): item = self._storage[path] Access to its item.mode = modeinformation/data return or write it. def chown(self, path, uid, gid): item = self._storage[path] This is the item.uid = uid File-System’s Job item.gid = gid def symlink(self, path, newpath): item = Item(0644 | stat.S_IFLNK, self.uid, self.gid) item.data = path self._storage[newpath] = item self._add_to_parent_dir(newpath) Symlinks contains just def readlink(self, path): pointed file path. return self._storage[path].data
  32. 32. Other small Examples
  33. 33. Simulate Tera Byte Filesclass TBFS(fuse.Fuse): Read-Only FS def getattr(self, path): with 1 file st = zstat(fuse.Stat()) if path == /: of 128TiB st.st_mode = 0644 | stat.S_IFDIR st.st_size = 1 return st elif path == /tera.data: No st.st_mode = 0644 | stat.S_IFREG st.st_size = 128 * (2 ** 40) Disk/RAM Space return st return -errno.ENOENT Required! def read(self, path, size, offset): return 0 * size read() Send data only def readdir(self, path, offset): if path == /: when is requested yield fuse.Direntry(tera.data)
  34. 34. X^OR File-Systemdef _xorData(data): data = [chr(ord(c) ^ 10) for c in data] 10101010 ^ return string.join(data, “”) 01010101 =class XorFS(fuse.Fuse): --------- ... 11111111 ^ def write(self, path, buf, offset): data = _xorData(buf) 01010101 = return _writeData(path, offset, data) --------- def read(self, path, length, offset): 10101010 data = _readData(path, offset, length) return _xorData(data) ... res = _xorData(“xor”) print res // “rex” res2 = _xorData(res) print res // “xor”
  35. 35. Dup Write File-Systemclass DupFS(fuse.Fuse): def __init__(self, *args, **kwargs): Write on your Disk ... fd_disk1 = open(‘/dev/hda1’, ...) partition 1 and 2. fd_disk2 = open(‘/dev/hdb5’, ...) fd_log = open(‘/home/th30z/testfs.log’, ...) fd_net = socket.socket(...) Send data ... ... over Network def write(self, path, buf, offset): ... disk_write(fd_disk1, path, offset, buf) Log your disk_write(fd_disk2, path, offset, buf) file-system net_write(fd_net, path, offset, buf) log_write(fd_log, path, offset, buf) operations ... ...do other fancy stuff
  36. 36. One more thing
  37. 37. (File and Folders doesn’t fit)Rethink the File-System I dont’t know where I’ve to place this file... ...Ok, for now Desktop is a good place...
  38. 38. (Mobile/Home Devices)Rethink the File-System Small Devices Small Files EMails, Text... We need to lookup quickly our data. Tags, Full-Text Search... ...Encourage people to view their content as objects.
  39. 39. (Large Clusters, The Cloud...) Rethink the File-SystemDistributed data Scalability Fail over Cluster Rebalancing
  40. 40. Q&A Python FUSEMatteo Bertozzi (Th30z) http://th30z.blogspot.com

×