The evolution of Linux file system
Gang He
ghe@suse.com
2
Agenda
• Local file system (LFS)
• Cluster file system (CFS)
• Distributed file system (DFS)
Local file system (LFS)
4
File system overview
5
File system concepts
• file descriptor (user space)
• struct file, struct dentry, struct inode, struct
address_space (kernel space)
• struct super_block,
meta data, file data,
buffer/page cache
6
Ext2 → Ext3 → Ext4
• Ext2 (1993), inspired from UFS, first popular and
stable Linux file system, but design is plain.
- File systems are getting bigger, how to look up a entry under a big
directory, how to reduce fsck time after crash ...
• Ext3 (2001), add journal, hash-tree directory indexing,
etc.
- File systems are getting bigger, have to eliminate these limitations
- Various advanced file systems impact ext3 ...
• Ext4 (2008), 48-bit addressing space, no limited
directory entries, extents, multi-block allocation,
delayed block allocation, online defragmentation, 256
bytes inode, persistent preallocation, barrier, etc.
7
Nowadays
• Ext4, will continue to be maintained due to stability and
historical reasons
• XFS, robust and scalable, good performance for large
storage, will shine in handling big file area (e.g. virtual
machine image)
• BtrFS, new design (replace ext4), inspired from ZFS,
contains many features of enterprise file system. for
examples, copy on write, own internal RAID (manage
volume ), snapshot/clone support, dynamically grow
and shrink, SSD support, etc.
Cluster file system (CFS)
9
Why CFS
• Independent storage devices (e.g. SAN).
• High availability requirement.
• How to scale out file system in
CPU,
memory,
even network bandwidth.
10
CFS common points
• File system POSIX semantics.
• Shared disk.
• Distributed lock manager (DLM).
• Cluster manager stacks.
11
CFS future
• Scale out more nodes, provide higher aggregation IO
bandwidth.
• More high availability, support online fsck, online
deframentation, online expand/shrink.
• File system level snapshot, file level clone.
• Tiered storage, SSD support.
• Deduplication.
• ...
Distributed file system (DFS)
13
Background
• Costs, storage array, fabric switches, HBA card, etc
are expensive.
• Unified storage space, linear expansion, commodity
hardware.
• Driven by Internet industry (e.g. search, picture share,
big data, etc).
• Google file system appeared (2003).
14
GFS-like DFS (HDFS, MooseFS, KFS)
15
DFS common points
• Not strictly comply with File system POSIX semantics,
most implementations are based on user-space.
• Share nothing, meta-data/file data are stored
separately, meta-data access/file data access are
separated.
• Have own local file system, a file represents a logical
data block, a data block has several copy blocks.
• Meta-data server usually load all meta data into
memory at start-up, writing logs records incremental
changes, then flushing memory to disk/merging log
and previous meta-data file gets a new checkpoint of
meta-data.
• Other algorithms: heartbeat algorithm, rack-aware,
block allocation policy, file lock management, etc.
16
Scale out
• Meta-data cluster server, e.g. GFS2, Ceph.
• Fully symmetric, no central meta-data server, e.g.
GlusterFS.
• Improved cluster management mechanism.
hearbeat/corosync → zookeeper cluster
• IO Flow Control, reduce meta-data server
dependence, costs control (ECC), etc.
VS.
17
Current trends
• Linear scale out.
• CompuStor hyper-converged systems.
• Flash technology utilization.
• High-speed network support.
• Application-Aware (e.g. VM image).
• Deduplication/Compression/Snapshot/Clone.
• Object/Block/File unified storage.
18

The evolution of linux file system

  • 1.
    The evolution ofLinux file system Gang He ghe@suse.com
  • 2.
    2 Agenda • Local filesystem (LFS) • Cluster file system (CFS) • Distributed file system (DFS)
  • 3.
  • 4.
  • 5.
    5 File system concepts •file descriptor (user space) • struct file, struct dentry, struct inode, struct address_space (kernel space) • struct super_block, meta data, file data, buffer/page cache
  • 6.
    6 Ext2 → Ext3→ Ext4 • Ext2 (1993), inspired from UFS, first popular and stable Linux file system, but design is plain. - File systems are getting bigger, how to look up a entry under a big directory, how to reduce fsck time after crash ... • Ext3 (2001), add journal, hash-tree directory indexing, etc. - File systems are getting bigger, have to eliminate these limitations - Various advanced file systems impact ext3 ... • Ext4 (2008), 48-bit addressing space, no limited directory entries, extents, multi-block allocation, delayed block allocation, online defragmentation, 256 bytes inode, persistent preallocation, barrier, etc.
  • 7.
    7 Nowadays • Ext4, willcontinue to be maintained due to stability and historical reasons • XFS, robust and scalable, good performance for large storage, will shine in handling big file area (e.g. virtual machine image) • BtrFS, new design (replace ext4), inspired from ZFS, contains many features of enterprise file system. for examples, copy on write, own internal RAID (manage volume ), snapshot/clone support, dynamically grow and shrink, SSD support, etc.
  • 8.
  • 9.
    9 Why CFS • Independentstorage devices (e.g. SAN). • High availability requirement. • How to scale out file system in CPU, memory, even network bandwidth.
  • 10.
    10 CFS common points •File system POSIX semantics. • Shared disk. • Distributed lock manager (DLM). • Cluster manager stacks.
  • 11.
    11 CFS future • Scaleout more nodes, provide higher aggregation IO bandwidth. • More high availability, support online fsck, online deframentation, online expand/shrink. • File system level snapshot, file level clone. • Tiered storage, SSD support. • Deduplication. • ...
  • 12.
  • 13.
    13 Background • Costs, storagearray, fabric switches, HBA card, etc are expensive. • Unified storage space, linear expansion, commodity hardware. • Driven by Internet industry (e.g. search, picture share, big data, etc). • Google file system appeared (2003).
  • 14.
  • 15.
    15 DFS common points •Not strictly comply with File system POSIX semantics, most implementations are based on user-space. • Share nothing, meta-data/file data are stored separately, meta-data access/file data access are separated. • Have own local file system, a file represents a logical data block, a data block has several copy blocks. • Meta-data server usually load all meta data into memory at start-up, writing logs records incremental changes, then flushing memory to disk/merging log and previous meta-data file gets a new checkpoint of meta-data. • Other algorithms: heartbeat algorithm, rack-aware, block allocation policy, file lock management, etc.
  • 16.
    16 Scale out • Meta-datacluster server, e.g. GFS2, Ceph. • Fully symmetric, no central meta-data server, e.g. GlusterFS. • Improved cluster management mechanism. hearbeat/corosync → zookeeper cluster • IO Flow Control, reduce meta-data server dependence, costs control (ECC), etc. VS.
  • 17.
    17 Current trends • Linearscale out. • CompuStor hyper-converged systems. • Flash technology utilization. • High-speed network support. • Application-Aware (e.g. VM image). • Deduplication/Compression/Snapshot/Clone. • Object/Block/File unified storage.
  • 18.