Osdc2011.ext4btrfs.talk

783 views
585 views

Published on

Some technical information on EXT4 and BTRFS (Spring 2011)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
783
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Osdc2011.ext4btrfs.talk

  1. 1. Quo vadis Linux File Systems Ext4 or BTRFS Udo Seidel
  2. 2. Agenda● Introduction/motivation● ext4 – the new member of the extfs family ● Facts, specs ● Migration● BTRFS – the newbie .. the hope ● Facts, specs ● Migration● Summary OSDC 2011 2
  3. 3. Linux file systems● More than 50 file systems shipped with Linux kernel ● Local ● Remote ● Cluster ● ...● A few as standard for root directory ● ext2, ext3 ● XFS OSDC 2011 3
  4. 4. Linux file systems – challenges● ReiserFS sun-setted● Limitations of ext3● Changes in recent Enterprise distributions OSDC 2011 4
  5. 5. Linux file systems – new players● New version of the ext family -> ext4 ● Marked as stable ● Shipped with Enterprise distributions● New approach with BTRFS ● Still experimental ● Default by some projects, e.g. MeeGo OSDC 2011 5
  6. 6. th 4 extended file system● Shipped since 2.6.19● Stable since 2.6.28● To overcome limits of ext3 ● Size ● Performance OSDC 2011 6
  7. 7. Ext4 - history● Successor of ext3● Started as set of patches for ext3● Later forked ● First called ext3dev (sometimes ext4dev) ● Not impact ext3 stability ● Less dependencies to ext3 code ● Easier to maintain source code OSDC 2011 7
  8. 8. Ext4 - facts● Max volume size: 1 EByte = 1024 PByte● Max file size: 16 TByte● Max length of file name: 256 Bytes● Support of extended attributes● No encryption● Not really compression● Partially 64bit OSDC 2011 8
  9. 9. Ext4 – starting from known● Known tools ● mkfs ● fsck ● tune2fs ● e2label OSDC 2011 9
  10. 10. Ext4 – global structure I● Entry point -> superblock ● Block size ● Number of blocks and inodes ● Number of free blocks and inodes● Disk divided in block groups ● backup of superblock ● Block group description (inode/block bitmaps) OSDC 2011 10
  11. 11. Ext4 – global structure II● Similar to ext3● Inherits some ext3 limitations ● Number of inodes per block group● 2nd type of block groups => flexible ● Flexible placement of bitmaps● Bigger inodes to store additional information ● 256 Bytes ● Nano second time stamps OSDC 2011 11
  12. 12. Ext4 – from blocks to extents● Common addressing for modern file systems● Contiguous area of blocks ● Less management information needed ● Less meta data operations ● Less “fragmentation”● Requires change of on-disk format OSDC 2011 12
  13. 13. Ext4 – extent I● 15 bit for extent size ● Block size of 4 KByte => 128 MByte● 1 bit for extent initialization informationstruct ext4_extent {  __le32  ee_block; /* first logical block extent covers */  __le16  ee_len;  /* number of blocks covered by extent */  __le16  ee_start_hi; /* high 16 bits of physical block */  __le32  ee_start_lo; /* low 32 bits of physical block */}; OSDC 2011 13
  14. 14. Ext4 – extent II● 32 bit for block addresses inside file ● Block size of 4 KByte => 16 TByte● 48 (!) bit for block addresses of file system ● Block size of 4 KByte => 1 EByte OSDC 2011 14
  15. 15. Ext4 – extent III● 60 Byte for extent information ● 12 Byte for extent header ● 12 Byte for extent structure – Up to 4 extents per inode – max. 512 MByte direct addressable (ext3: 48 KByte) – Different schema for bigger files OSDC 2011 15
  16. 16. Ext4 – extent tree I● For files > 512 MByte● B+ tree● Extent structure only at leaf nodes● New element: extent index ● Same header structure like data extent ● Points to data block ● Data block contains either extent index or extent structure OSDC 2011 16
  17. 17. Ext4 – extent tree II OSDC 2011 17
  18. 18. Ext4 – from extents to blocks● At the end block allocation● New features ● Multi-block allocation ● Delayed allocation ● Persistent allocation OSDC 2011 18
  19. 19. Ext4 – multi-block allocation● Ext3: only one block ● 12800 calls for 50 MByte file● Ext4: multiple blocks per call ● Less overhead ● Contiguous physical location of data OSDC 2011 19
  20. 20. Ext4 – delayed allocation● Ext3 ● Instant block allocation ● Fragmentation due to buffers and caches● Ext4 ● Delayed block allocation ● Use cache information for placement ● Risk of data loss in early versions => improved since 2.6.30 OSDC 2011 20
  21. 21. Ext4 – “clever” allocation● Support of system call fallocate() ● Application reserves blocks ahead ● File system ensures disk space availability● Allocation information in extent structure ● Remember 16th bit OSDC 2011 21
  22. 22. Ext4 – consistent status● New journaling => JBD2 ● Transactions have checksums ● 64 bit ready ● Deactivation possible OSDC 2011 22
  23. 23. Ext4 – repair● Improved fsck() ● No check of unused blocks – information stored in block group header – Information secured via checksums – (de)activation possible at any time ● First run as slow like in ext3 OSDC 2011 23
  24. 24. Ext4 – other news● Nano second precision time stamps ● Unix millennium bug shifted to 2514● More subdirectories ● Up to 65000 ● More than 65000 ... with limitation OSDC 2011 24
  25. 25. Ext4 – general migration paths● mkfs() and backup/restore ● Clean new file system structure ● Only way for file systems other than ext2/3 ● Extended outage● Conversion via tune2fs ● Partial only ● Only possible for ext family ● Faster/easier OSDC 2011 25
  26. 26. Ext4 – background for migration● 2 kind of changes compared to ext3 ● change of ondisk format: – Extents – Only enabled for new files via tune2fs – Additional tasks needed ● Ondisk format not relevant – block allocation – Immediately enabled via tune2fs OSDC 2011 26
  27. 27. Ext4 – migration via tune2fs● Results in mix of ext3 and ext4 structure● Access via ext3 driver impossible● fsck() needed parameter description extent Extent based block allocation flex_bg Flexible placement of meta data uninit_bg Flag uninitialized blocks for faster fsck dir_nlink Infinite number of sub directories extra_isize Timestamps with nano seconds OSDC 2011 27
  28. 28. Ext4 – migration hints● fsck() recommended● /boot – booting from ext4 possible?● Rescue media enabled for ext4? OSDC 2011 28
  29. 29. Ext4 – summary● Good successor of ext3● Manages higher amount of data● Faster ● Performance ● recovery● Safer● Sufficient migration options from ext2/3 OSDC 2011 29
  30. 30. Better/b-tree file system● Shipped since 2.6.29● Still experimental● Replace ext3/4● New storage management approach OSDC 2011 30
  31. 31. BTRFS - history● Basic idea ● Shown 2007 ● Usage of B trees for standard structures ● Not new ... see XFS, ReiserFS● Chris Mason ● Worked on ReiserFS for SUSE ● Moved to Oracle -> started BTRFS developement OSDC 2011 31
  32. 32. BTRFS - facts● Max file/volume size: 16 EByte● Max length of file name: 256 Bytes● Support of ● Extended attributes ● Encryption ● Compression ● Snapshot ● Copy-on-Write OSDC 2011 32
  33. 33. BTRFS – global structure● Entry point -> superblock● More than one file system per volume● Extents ● Put together in block groups ● No mix of data and meta data OSDC 2011 33
  34. 34. BTRFS – internals: the trees● Consists of B+ trees ● Root tree ● File system tree ● Extent allocation tree ● Checksum tree ● Log tree ● Chunk & device tree ● Data relocation tree OSDC 2011 34
  35. 35. BTRFS – internals: structures● 3 structures ● Key – index of the tree structure ● Block header – ID of file system – Reference of insert time – Level position ● Item – Different types: inodes, extents, directories OSDC 2011 35
  36. 36. BTRFS – internals: the key● Index of the tree structure● Size: 136 bit● First 64 bit: unique object ID● Next 8 bit: type/item● Last 64 bit: item dependent ● e.g. Hash of directory name ● e.g. Number of elements in directory ● e.g. object ID of upper layer directory OSDC 2011 36
  37. 37. BTRFS – internals: the item● More than one item per object ID possible Item Value INODE_ITEM 1 XATTR_ITEM 24 DIR_ITEM 84 DIR_INDEX 96 EXTENT_DATA 108 EXTENT_CSUM 128 ROOT_ITEM 132 EXTENT_ITEM 168 OSDC 2011 37
  38. 38. BTRFS – more about trees● Highest layer ● Root tree ● Referenced in superblock ● Other trees => object ID in root tree● Some trees unique ● Extent allocation ● Data relocation● Possibly multiple trees ● File system OSDC 2011 38
  39. 39. BTRFS – file system tree● Visible part● Contains: ● Inode items ● Reference items● No data of files ● See extents ● Exception: small files OSDC 2011 39
  40. 40. BTRFS – extent allocation tree● Space management● Backward reference ● file system object ● Possibly multiple per extent ● Maybe move to extent data reference object OSDC 2011 40
  41. 41. BTRFS – other trees● Log tree ● Collects fsync() calls ● Journal of this kind of COW calls● Checksum tree ● CRC32 checksums of data and meta data● Chunk tree ● Manage devices: device item and chunk map item● Device tree ● Counterpart of chunk tree OSDC 2011 41
  42. 42. BTRFS – device management● Included volume manager● pool concept● RAID-0 and RAID-1 ● For data and meta data ● Not necessarily identical● Chunk tree ● abstract from disk block OSDC 2011 42
  43. 43. BTRFS – extents, chunks, blocks OSDC 2011 43
  44. 44. BTRFS – what else● Transparent compression via zlib● Support of POSIX ACLs● Online grow/shrink● Online add/removal of disks● No fsck() tool (yet)● Management tool evolution (btrfsctl -> btrfs) OSDC 2011 44
  45. 45. BTRFS – migration I● Via tool btrfs-convert● du/df not fully BTRFS-aware● In place from ext3/4 ● Via libe2fs ● BTRFS meta data location flexible ● Old ext3/4 organized in snapshot ● Roll-back possible to date/time of conversion OSDC 2011 45
  46. 46. BTRFS – migration II OSDC 2011 46
  47. 47. BTRFS summary● Still experimental● Meets standard file systems requirements● Bridges existing gaps ● e.g. snapshots● easy migration from ext3/4 possible● New approach to storage management ● e.g. included volume manager OSDC 2011 47
  48. 48. Summary● Improvement moving to ext4● Safe switching to ext4● In place migration from ext3 possible● Future is BTRFS● In place migration from ext3/4 to BTRFS possible OSDC 2011 48
  49. 49. References● http://ext4.wiki.kernel.org● http://btrfs.wiki.kernel.org OSDC 2011 49
  50. 50. Thank you! OSDC 2011 50

×