Local file systems update


Published on

The slides gives sn overview on what is currently cooking in local Linux file systems and what has been done in the recent past. With btrfs getting stabilized, xfs gaining more traction, ext4 improvements, new storage capabilities and file system requirements we are in the exciting new era where it might be hard to keep on track with the recent development. This talk should get you a picture on where are we heading to, get you familiar with the new features and capabilities and give you an idea how to use them correctly.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Local file systems update

  1. 1. Local file systems updateRed HatLuk´ˇ Czerner asFebruary 23, 2013
  2. 2. Copyright © 2013 Luk´ˇ Czerner, Red Hat. asPermission is granted to copy, distribute and/or modify thisdocument under the terms of the GNU Free DocumentationLicense, Version 1.3 or any later version published by the FreeSoftware Foundation; with no Invariant Sections, no Front-CoverTexts, and no Back-Cover Texts. A copy of the license is includedin the COPYING file.
  3. 3. Agenda 1 Linux file systems overview 2 Challenges we’re facing today 3 Xfs 4 Ext4 5 Btrfs 6 Questions ?
  4. 4. Part ILinux kernel file systems overview
  5. 5. File systems in Linux kernel Linux kernel has a number of file systems Cluster, network, local Special purpose file systems Virtual file systems Close interaction with other Linux kernel subsystems Memory Management Block layer VFS - virtual file system switch Optional stackable device drivers device mapper mdraid
  6. 6. The Linux I/O Stack Diagram version 0.1, 2012-03-06 outlines the Linux I/O stack as of Kernel version 3.3 Applications (Processes) anonymous pages chmod(2) (malloc) write(2) open(2) read(2) stat(2) ... VFS block based FS Network FS pseudo FS special ext2 ext3 ext4 NFS coda proc sysfs purpose FS direct I/O Page (O_DIRECT) gfs ocfs pipefs futexfs tmpfs ramfs Cache xfs btrfs ifs smbfs ... usbfs ... devtmpfs iso9660 ... network Block I/O Layer LVM optional stackable devices on top of “normal” block devices – work on bios mdraid device drbd lvm ... mapper BIOs (Block I/O) I/O Scheduler maps bios to requests cfq deadline noop hooked in Device Drivers (hook in similar like stacked devices like request-based mdraid/device mapper do)device mapper targets /dev/fio* /dev/rssd* dm-multipath SCSI upper layer iomemory-vsl mtip32xx with module option /dev/vd* /dev/fio* /dev/sda /dev/sdb ... /dev/nvme#n# nvme sysfs (transport attributes) SCSI mid layer virtio_blk iomemory-vslTransport Classesscsi_transport_fc scsi_transport_sas SCSI low layer scsi_transport_... libata megaraid sas aacraid qla2xxx lpfc ... ahci ata_piix ... HDD SSD DVD LSI Adaptec Qlogic Emulex ... Fusion-io nvme Micron drive RAID RAID HBA HBA PCIe Card device PCIe Card Physical devicesThe Linux I/O Stack Diagram (version 0.1, 2012-03-06)http://www.thomas-krenn.com/en/oss/linux-io-stack-diagram.htmlCreated by Werner Fischer and Georg SchönbergerLicense: CC-BY-SA 3.0, see http://creativecommons.org/licenses/by-sa/3.0/
  7. 7. Most active local file systems File system Commits Developers Active developers Ext4 648 112 13 Ext3 105 43 2 Xfs 650 61 8 Btrfs 1302 114 21
  8. 8. 14 1/ /0 01 13 1/ /0 01 12 1/ Ext3 Btrfs Xfs Ext4 /0 01 11 1/ /0 01 10 1/ /0 01 09 1/ /0 01Number of lines of code 08 1/ /0 01 07 1/ /0 01 06 1/ /0 01 05 1/ /0 01 80000 70000 60000 50000 40000 30000 20000 10000 Lines of code
  9. 9. Part IIChallenges we’re facing today
  10. 10. Scalability Common hardware storage capacity increases You can buy single 4TB drive for a reasonable price Bigger file system and file size Common hardware computing power and parallelism increases More processes/threads accessing the file system Locking issues I/O stack designed for high latency low IOPS Problems solved in networking subsystem
  11. 11. Reliability Scalability and Reliability are closely coupled problems Being able to fix your file system In reasonable time With reasonable memory requirements Detect errors before your application does Metadata checksumming Metadata should be self describing Online file system scrub
  12. 12. New types of storage Non-volatile memory Wear levelling more-or-less solved in firmware Block layer has it’s IOPS limitations We can expect bigger erase blocks Thinly provisioned storage Lying to users to get more from expensive storage Filesystems can throw away most of it’s locality optimization Cut down performance Device mapper dm-thinp target Hierarchical storage Hide inexpensive slow storage behind expensive fast storage Performance depends on working set size Improve performance Device mapper dm-cache target, bcache
  13. 13. Maintainability issues More file systems with different use cases Multiple set of incompatible user space applications Different set of features and defaults Each file system have different management requirements Requirements from different types of storage SSD Thin provisioning Bigger sector sizes Deeper storage technology stack mdraid device mapper multipath Having a centralized management tool is incredibly useful Having a central source of information is a must System Storage Manager http://storagemanager.sf.net
  14. 14. Part IIIWhat’s new in xfs
  15. 15. Scalability improvements Delayed logging Impressive improvements in metadata modification performance Single threaded workload still slower then ext4, but not much With more threads scales much better than ext4 On-disk format change XFS scales well up to hundreds of terabytes Allocation scalability Free space indexing Locking optimization Pretty much the best choice for beefy configurations with lots of storage
  16. 16. Reliability improvements Metadata checksumming CRC to detect errors Metadata verification as it is written to or read from disk On-disk format change Future work Reverse mapping allocation tree Online transparent error correction Online metadata scrub
  17. 17. Part IVWhat’s new in ext4
  18. 18. Scalability improvements Based on very old architecture Free space tracked in bitmaps on disk Static metadata positions Limited size of allocation groups Limited file size limit (16TB) Advantages are resilient on-disk format and backwards and forward compatibility Some improvements with bigalloc feature Group number of blocs into clusters Cluster is now the smallest allocation unit Trade-off between performance and space utilization efficiency Extent status tree for tracking delayed extents No longer need to scan page cache to find delalloc blocks Scalability is very much limited by design, on-disk format and backwards compatibility
  19. 19. Reliability improvements Better memory utilization of user space tools No longer stores whole bitmaps - converted to extents Biggest advantage for e2fsck Faster file system creation Inode table initialization postponed to kernel Huge time saver when creating bigger file systems Metadata checksumming CRC to detect errors Not enabled by default
  20. 20. Part VWhat’s new in btrfs
  21. 21. Getting stabilized Performance improvements is not where the focus is right now Design specific performance problems Optimization needed in future Still under heavy development Not all features are yet ready or even implemented File system stabilization takes a long time
  22. 22. Reliability in btrfs Userspace tools not in very good shape Fsck utility still not fully finished Neither kernel nor userspace handles errors gracefully Very good design to build on Metadata and data checksumming Back reference Online filesystem scrub
  23. 23. Resources Linux Weekly News http://lwn.net Kernel mailing lists http://vger.kernel.org linux-fsdevel linux-ext4 linux-btrfs linux-xfs Linux Kernel code http://kernel.org Linux IO stack diagram http://www.thomas-krenn.com/en/oss/linuxiostack- diagram.html
  24. 24. The end.Thanks for listening.