Published on

Virtual file system by Waqas

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Linux Virtual File System Peter J. Braam
  2. 2. Aims <ul><li>Present the data structures in Linux VFS </li></ul><ul><li>Provide information about flow of control </li></ul><ul><li>Describe methods and invariants needed to implement a new file system </li></ul><ul><li>Illustrate with some examples </li></ul>
  3. 3. History <ul><li>BSD implemented VFS for NFS: aim dispatch to different filesystems </li></ul><ul><li>VMS had elaborate filesystem </li></ul><ul><li>NT/Win95 have VFS type interfaces </li></ul><ul><li>Newer systems integrate VM with buffer cache. </li></ul>File access VFS ufs nfs Coda disk Venus udp
  4. 4. Linux Filesystems <ul><li>Media based </li></ul><ul><ul><li>ext2 - Linux native </li></ul></ul><ul><ul><li>ufs - BSD </li></ul></ul><ul><ul><li>fat - DOS FS </li></ul></ul><ul><ul><li>vfat - win 95 </li></ul></ul><ul><ul><li>hpfs - OS/2 </li></ul></ul><ul><ul><li>minix - well…. </li></ul></ul><ul><ul><li>Isofs - CDROM </li></ul></ul><ul><ul><li>sysv - Sysv Unix </li></ul></ul><ul><ul><li>hfs - Macintosh </li></ul></ul><ul><ul><li>affs - Amiga Fast FS </li></ul></ul><ul><ul><li>NTFS - NT’s FS </li></ul></ul><ul><ul><li>adfs - Acorn-strongarm </li></ul></ul><ul><li>Network </li></ul><ul><ul><li>nfs </li></ul></ul><ul><ul><li>Coda </li></ul></ul><ul><ul><li>AFS - Andrew FS </li></ul></ul><ul><ul><li>smbfs - LanManager </li></ul></ul><ul><ul><li>ncpfs - Novell </li></ul></ul><ul><li>Special ones </li></ul><ul><ul><li>procfs -/proc </li></ul></ul><ul><ul><li>umsdos - Unix in DOS </li></ul></ul><ul><ul><li>userfs - redirector to user </li></ul></ul>
  5. 5. Linux Filesystems (ctd) <ul><li>Forthcoming: </li></ul><ul><ul><li>devfs - device file system </li></ul></ul><ul><ul><li>DFS - DCE distributed FS </li></ul></ul><ul><li>Varia: </li></ul><ul><ul><li>cfs - crypt filesystem </li></ul></ul><ul><ul><li>cfs - cache filesystem </li></ul></ul><ul><ul><li>ftpfs - ftp filesystem </li></ul></ul><ul><ul><li>mailfs - mail filesystem </li></ul></ul><ul><ul><li>pgfs - Postgres versioning file system </li></ul></ul><ul><li>Linux serves (unrelated to the VFS!) </li></ul><ul><ul><li>NFS - user & kernel </li></ul></ul><ul><ul><li>Coda </li></ul></ul><ul><ul><li>AppleShare - netatalk/CAP </li></ul></ul><ul><ul><li>SMB - samba </li></ul></ul><ul><ul><li>NCP - Novell </li></ul></ul>
  6. 6. Linux is Obsolete Andrew Tanenbaum Usefulness
  7. 7. Linux VFS <ul><li>Multiple interfaces build up VFS: </li></ul><ul><ul><li>files </li></ul></ul><ul><ul><li>dentries </li></ul></ul><ul><ul><li>inodes </li></ul></ul><ul><ul><li>superblock </li></ul></ul><ul><ul><li>quota </li></ul></ul><ul><li>VFS can do all caching & provides utility fctns to FS </li></ul><ul><li>FS provides methods to VFS; many are optional </li></ul>File access ext2fs nfs Coda FS VFS VFS VFS disk udp Venus
  8. 8. User level file access <ul><li>Typical user level types and code: </li></ul><ul><ul><li>pathnames : “/myfile” </li></ul></ul><ul><ul><li>file descriptors : fd = open(“/myfile”…) </li></ul></ul><ul><ul><li>attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown... </li></ul></ul><ul><ul><li>offsets : write, read, lseek </li></ul></ul><ul><ul><li>directory handles : DIR *dh = opendir(“/mydir”) </li></ul></ul><ul><ul><li>directory entries: struct dirent *ent = readdir(dh) </li></ul></ul>
  9. 9. VFS <ul><li>Manages kernel level file abstractions in one format for all file systems </li></ul><ul><li>Receives system call requests from user level (e.g. write, open, stat, link) </li></ul><ul><li>Interacts with a specific file system based on mount point traversal </li></ul><ul><li>Receives requests from other parts of the kernel, mostly from memory management </li></ul>
  10. 10. File system level <ul><li>Individual File Systems </li></ul><ul><ul><li>responsible for managing file & directory data </li></ul></ul><ul><ul><li>responsible for managing meta-data : timestamps, owners, protection etc </li></ul></ul><ul><ul><li>translates data between </li></ul></ul><ul><ul><ul><li>particular FS data : e.g. disk data, NFS data, Coda/AFS data </li></ul></ul></ul><ul><ul><ul><li>VFS data : attributes etc in standard format </li></ul></ul></ul><ul><ul><li>e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so. </li></ul></ul>
  11. 11. Anatomy of stat system call sys_stat(path, buf) { dentry = namei(path); if ( dentry == NULL ) return -ENOENT; inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc; } Establish VFS data Call into inode layer of filesystem Call into inode layer of filesystem
  12. 12. Anatomy of fstatfs system call sys_fstatfs(fd, buf) { /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF; superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc; } Call into superblock layer of filesystem Translate fd to VFS data structure
  13. 13. Data structures <ul><li>VFS data structures for: </li></ul><ul><ul><li>VFS handle to the file: inode (BSD: vnode) </li></ul></ul><ul><ul><li>User instantiated file handle: file (BSD: file) </li></ul></ul><ul><ul><li>The whole filesystem: superblock (BSD: vfs) </li></ul></ul><ul><ul><li>A name to inode translation: dentry </li></ul></ul>
  14. 14. Shorthand method notation <ul><li>super block methods: sss_methodname </li></ul><ul><li>inode methods: iii_methodname </li></ul><ul><li>dentry methods: ddd_methodname </li></ul><ul><li>file methods: fff_methodname </li></ul><ul><li>instead of : </li></ul><ul><li>inode i_op lookup we write iii_lookup </li></ul>
  15. 15. namei struct dentry *namei(parent, name) { if (dentry = d_lookup(parent,name)) else ddd_hash(parent, name) ddd_revalidate(dentry) iii_lookup(parent, name) sss_read_inode(…) struct inode *iget(ino, dev) { /* try cache else .. */ } VFS FS
  16. 16. Superblocks <ul><li>Handle metadata only (attributes etc) </li></ul><ul><li>Responsible for retrieving and storing metadata from the FS media or peers </li></ul><ul><li>Struct superblocks hold things like: </li></ul><ul><ul><li>device, blocksize, dirty flags, list of dirty inodes </li></ul></ul><ul><ul><li>super operations </li></ul></ul><ul><ul><li>wait queue </li></ul></ul><ul><ul><li>pointer to the root inode of this FS </li></ul></ul>
  17. 17. Super Operations (sss_) <ul><li>Ops on Inodes : </li></ul><ul><ul><li>read_inode </li></ul></ul><ul><ul><li>put_inode </li></ul></ul><ul><ul><li>write_inode </li></ul></ul><ul><ul><li>delete_inode </li></ul></ul><ul><ul><li>clear_inode </li></ul></ul><ul><ul><li>notify_change </li></ul></ul><ul><li>Superblock manips: </li></ul><ul><ul><li>read_super ( mount ) </li></ul></ul><ul><ul><li>put_super ( unmount ) </li></ul></ul><ul><ul><li>write_super ( unmount ) </li></ul></ul><ul><ul><li>statfs (attributes) </li></ul></ul>
  18. 18. Inodes <ul><li>Inodes are VFS abstraction for the file </li></ul><ul><li>Inode has operations (iii_methods) </li></ul><ul><li>VFS maintains an inode cache, NOT the individual FS’s (compare NT, BSD etc) </li></ul><ul><li>Inodes contain an FS specific area where: </li></ul><ul><ul><li>ext2 stores disk block numbers etc </li></ul></ul><ul><ul><li>AFS would store the FID </li></ul></ul><ul><li>Extraordinary inode ops are good for dealing with stale NFS file handles etc. </li></ul>
  19. 19. What’s inside an inode - 1 list_head i_hash list_head i_list list_head i_dentry int i_count long i_ino int i_dev {m,a,c}time {u,g}id mode size n_link caching Identifies file Usual stuff
  20. 20. What’s inside an inode -2 superblock i_sb inode_ops i_op wait objects, semaphore lock vm_area_struct pipe/socket info page information union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda ..} u Which FS For mmap, networking waiting FS Specific info: blockno’s fids etc
  21. 21. Inode state <ul><li>Inode can be on one or two lists: </li></ul><ul><ul><li>( hash & in_use ) or ( hash & dirty ) or unused </li></ul></ul><ul><ul><li>inode has a use count i_count </li></ul></ul><ul><li>Transitions </li></ul><ul><ul><li>unused  hash : iget calls sss_read_inode </li></ul></ul><ul><ul><li>dirty  in_use : sss_write_inode </li></ul></ul><ul><ul><li>hash  unused : call on sss_clear_inode, but if </li></ul></ul><ul><ul><ul><li>i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0 </li></ul></ul></ul>
  22. 22. Inode Cache Dirty inodes Inode_hashtable 1. iget: if i_count>0 ++ 2. iput: if i_count>1 - - sss_write_inode (sync one) Used inodes Unused inodes sss_read_inode (iget) sss_clear_inode (freeing inos) or sss_delete_inode (iput) media fs only (mark_inode_dirty) 3. free_inodes 4. syncing inodes Players: Fs storage Fs storage Fs storage
  23. 23. Sales Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998. Estimates of installed servers (InfoWorld): - Linux: 7 million - OS/2: 5 million - Macintosh: 1 million
  24. 24. Inode operations (iii_) <ul><li>lookup: return inode </li></ul><ul><ul><li>calls iget </li></ul></ul><ul><li>creation/removal </li></ul><ul><ul><li>create </li></ul></ul><ul><ul><li>link </li></ul></ul><ul><ul><li>unlink </li></ul></ul><ul><ul><li>symlink </li></ul></ul><ul><ul><li>mkdir </li></ul></ul><ul><ul><li>rmdir </li></ul></ul><ul><ul><li>mknod </li></ul></ul><ul><ul><li>rename </li></ul></ul><ul><li>symbolic links </li></ul><ul><ul><li>readlink </li></ul></ul><ul><ul><li>follow link </li></ul></ul><ul><li>pages </li></ul><ul><ul><li>readpage, writepage, updatepage - read or write page. Generic for mediafs. </li></ul></ul><ul><ul><li>bmap - return disk block number of logical block </li></ul></ul><ul><li>special operations </li></ul><ul><ul><li>revalidate - see dentry sect </li></ul></ul><ul><ul><li>truncate </li></ul></ul><ul><ul><li>permission </li></ul></ul>
  25. 25. Dentry world <ul><li>Dentry is a name to inode translation structure </li></ul><ul><li>Cached agressively by VFS </li></ul><ul><li>Eliminates lookups by FS & private caches </li></ul><ul><ul><li>timing on Coda FS: ls -lR 1000 files after priming cache </li></ul></ul><ul><ul><ul><li>linux 2.0.32: 7.2secs </li></ul></ul></ul><ul><ul><ul><li>linux 2.1.92: 0.6secs </li></ul></ul></ul><ul><ul><li>disk fs: less benefit, NFS even more </li></ul></ul><ul><li>Negative entries! </li></ul><ul><li>Namei is dramatically simplified </li></ul>
  26. 26. Inside dentry’s <ul><li>name </li></ul><ul><li>pointer to inode </li></ul><ul><li>pointer to parent dentry </li></ul><ul><li>list head of children </li></ul><ul><li>chains for lots of lists </li></ul><ul><li>use count </li></ul>
  27. 27. Dentry associated lists d_alias chains place : d_instantiate remove : dentry_iput inode I_dentry list head d_child chains place : d_alloc remove : d_prune, d_invalidate, d_put inode i_dentry list head = d_inode pointer = d_parent pointer dentry inode relationship dentry tree relationship Legend: inode dentry
  28. 28. Dcache <ul><li>namei tries cache: d_lookup </li></ul><ul><ul><li>ddd_compare </li></ul></ul><ul><li>Success: ddd_revalidate </li></ul><ul><ul><li>d_invalidate if fails </li></ul></ul><ul><ul><li>proceed if success </li></ul></ul><ul><li>Failure: iii_lookup </li></ul><ul><ul><li>find inode </li></ul></ul><ul><ul><li>iget </li></ul></ul><ul><ul><ul><li>sss_read_inode </li></ul></ul></ul><ul><ul><li>finish: </li></ul></ul><ul><ul><ul><li>d_add </li></ul></ul></ul><ul><ul><li>can give negative entry in dcache </li></ul></ul>dentry_hashtable (d_hash chains) unused dentries (d_lru chains) namei iii_lookup d_add prune d_invalidate d_drop dhash(parent, name) list head
  29. 29. Dentry methods <ul><li>ddd_revalidate: can force new lookup </li></ul><ul><li>ddd_hash: compute hash value of name </li></ul><ul><li>ddd_compare: are names equal? </li></ul><ul><li>ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity </li></ul>
  30. 30. Dentry particulars: <ul><li>ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat: </li></ul><ul><ul><li>case insensitive </li></ul></ul><ul><ul><li>long and short filename pleasantries </li></ul></ul><ul><li>ddd_revalidate -- can force new lookup if inode not in use: </li></ul><ul><ul><li>used for NFS/SMBfs aging </li></ul></ul><ul><ul><li>used for Coda/AFS callbacks </li></ul></ul>
  31. 31. Style Dijkstra probably hates me Linus Torvalds
  32. 32. Memory mapping <ul><li>vm_area structure has </li></ul><ul><ul><li>vm_operations </li></ul></ul><ul><ul><li>inode, addresses etc. </li></ul></ul><ul><li>vm_operations </li></ul><ul><ul><li>map, unmap </li></ul></ul><ul><ul><li>swapin, swapout </li></ul></ul><ul><ul><li>nopage -- read when page isn’t in VM </li></ul></ul><ul><li>mmap </li></ul><ul><ul><li>calls on iii_readpage </li></ul></ul><ul><ul><li>keeps a use count on the inode until unmap </li></ul></ul>