Your SlideShare is downloading. ×
0
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Osdc2011.ext4btrfs.talk
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Osdc2011.ext4btrfs.talk

416

Published on

Some technical information on EXT4 and BTRFS (Spring 2011)

Some technical information on EXT4 and BTRFS (Spring 2011)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
416
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Quo vadis Linux File Systems Ext4 or BTRFS Udo Seidel
  • 2. Agenda● Introduction/motivation● ext4 – the new member of the extfs family ● Facts, specs ● Migration● BTRFS – the newbie .. the hope ● Facts, specs ● Migration● Summary OSDC 2011 2
  • 3. Linux file systems● More than 50 file systems shipped with Linux kernel ● Local ● Remote ● Cluster ● ...● A few as standard for root directory ● ext2, ext3 ● XFS OSDC 2011 3
  • 4. Linux file systems – challenges● ReiserFS sun-setted● Limitations of ext3● Changes in recent Enterprise distributions OSDC 2011 4
  • 5. Linux file systems – new players● New version of the ext family -> ext4 ● Marked as stable ● Shipped with Enterprise distributions● New approach with BTRFS ● Still experimental ● Default by some projects, e.g. MeeGo OSDC 2011 5
  • 6. th 4 extended file system● Shipped since 2.6.19● Stable since 2.6.28● To overcome limits of ext3 ● Size ● Performance OSDC 2011 6
  • 7. Ext4 - history● Successor of ext3● Started as set of patches for ext3● Later forked ● First called ext3dev (sometimes ext4dev) ● Not impact ext3 stability ● Less dependencies to ext3 code ● Easier to maintain source code OSDC 2011 7
  • 8. Ext4 - facts● Max volume size: 1 EByte = 1024 PByte● Max file size: 16 TByte● Max length of file name: 256 Bytes● Support of extended attributes● No encryption● Not really compression● Partially 64bit OSDC 2011 8
  • 9. Ext4 – starting from known● Known tools ● mkfs ● fsck ● tune2fs ● e2label OSDC 2011 9
  • 10. Ext4 – global structure I● Entry point -> superblock ● Block size ● Number of blocks and inodes ● Number of free blocks and inodes● Disk divided in block groups ● backup of superblock ● Block group description (inode/block bitmaps) OSDC 2011 10
  • 11. Ext4 – global structure II● Similar to ext3● Inherits some ext3 limitations ● Number of inodes per block group● 2nd type of block groups => flexible ● Flexible placement of bitmaps● Bigger inodes to store additional information ● 256 Bytes ● Nano second time stamps OSDC 2011 11
  • 12. Ext4 – from blocks to extents● Common addressing for modern file systems● Contiguous area of blocks ● Less management information needed ● Less meta data operations ● Less “fragmentation”● Requires change of on-disk format OSDC 2011 12
  • 13. Ext4 – extent I● 15 bit for extent size ● Block size of 4 KByte => 128 MByte● 1 bit for extent initialization informationstruct ext4_extent {  __le32  ee_block; /* first logical block extent covers */  __le16  ee_len;  /* number of blocks covered by extent */  __le16  ee_start_hi; /* high 16 bits of physical block */  __le32  ee_start_lo; /* low 32 bits of physical block */}; OSDC 2011 13
  • 14. Ext4 – extent II● 32 bit for block addresses inside file ● Block size of 4 KByte => 16 TByte● 48 (!) bit for block addresses of file system ● Block size of 4 KByte => 1 EByte OSDC 2011 14
  • 15. Ext4 – extent III● 60 Byte for extent information ● 12 Byte for extent header ● 12 Byte for extent structure – Up to 4 extents per inode – max. 512 MByte direct addressable (ext3: 48 KByte) – Different schema for bigger files OSDC 2011 15
  • 16. Ext4 – extent tree I● For files > 512 MByte● B+ tree● Extent structure only at leaf nodes● New element: extent index ● Same header structure like data extent ● Points to data block ● Data block contains either extent index or extent structure OSDC 2011 16
  • 17. Ext4 – extent tree II OSDC 2011 17
  • 18. Ext4 – from extents to blocks● At the end block allocation● New features ● Multi-block allocation ● Delayed allocation ● Persistent allocation OSDC 2011 18
  • 19. Ext4 – multi-block allocation● Ext3: only one block ● 12800 calls for 50 MByte file● Ext4: multiple blocks per call ● Less overhead ● Contiguous physical location of data OSDC 2011 19
  • 20. Ext4 – delayed allocation● Ext3 ● Instant block allocation ● Fragmentation due to buffers and caches● Ext4 ● Delayed block allocation ● Use cache information for placement ● Risk of data loss in early versions => improved since 2.6.30 OSDC 2011 20
  • 21. Ext4 – “clever” allocation● Support of system call fallocate() ● Application reserves blocks ahead ● File system ensures disk space availability● Allocation information in extent structure ● Remember 16th bit OSDC 2011 21
  • 22. Ext4 – consistent status● New journaling => JBD2 ● Transactions have checksums ● 64 bit ready ● Deactivation possible OSDC 2011 22
  • 23. Ext4 – repair● Improved fsck() ● No check of unused blocks – information stored in block group header – Information secured via checksums – (de)activation possible at any time ● First run as slow like in ext3 OSDC 2011 23
  • 24. Ext4 – other news● Nano second precision time stamps ● Unix millennium bug shifted to 2514● More subdirectories ● Up to 65000 ● More than 65000 ... with limitation OSDC 2011 24
  • 25. Ext4 – general migration paths● mkfs() and backup/restore ● Clean new file system structure ● Only way for file systems other than ext2/3 ● Extended outage● Conversion via tune2fs ● Partial only ● Only possible for ext family ● Faster/easier OSDC 2011 25
  • 26. Ext4 – background for migration● 2 kind of changes compared to ext3 ● change of ondisk format: – Extents – Only enabled for new files via tune2fs – Additional tasks needed ● Ondisk format not relevant – block allocation – Immediately enabled via tune2fs OSDC 2011 26
  • 27. Ext4 – migration via tune2fs● Results in mix of ext3 and ext4 structure● Access via ext3 driver impossible● fsck() needed parameter description extent Extent based block allocation flex_bg Flexible placement of meta data uninit_bg Flag uninitialized blocks for faster fsck dir_nlink Infinite number of sub directories extra_isize Timestamps with nano seconds OSDC 2011 27
  • 28. Ext4 – migration hints● fsck() recommended● /boot – booting from ext4 possible?● Rescue media enabled for ext4? OSDC 2011 28
  • 29. Ext4 – summary● Good successor of ext3● Manages higher amount of data● Faster ● Performance ● recovery● Safer● Sufficient migration options from ext2/3 OSDC 2011 29
  • 30. Better/b-tree file system● Shipped since 2.6.29● Still experimental● Replace ext3/4● New storage management approach OSDC 2011 30
  • 31. BTRFS - history● Basic idea ● Shown 2007 ● Usage of B trees for standard structures ● Not new ... see XFS, ReiserFS● Chris Mason ● Worked on ReiserFS for SUSE ● Moved to Oracle -> started BTRFS developement OSDC 2011 31
  • 32. BTRFS - facts● Max file/volume size: 16 EByte● Max length of file name: 256 Bytes● Support of ● Extended attributes ● Encryption ● Compression ● Snapshot ● Copy-on-Write OSDC 2011 32
  • 33. BTRFS – global structure● Entry point -> superblock● More than one file system per volume● Extents ● Put together in block groups ● No mix of data and meta data OSDC 2011 33
  • 34. BTRFS – internals: the trees● Consists of B+ trees ● Root tree ● File system tree ● Extent allocation tree ● Checksum tree ● Log tree ● Chunk & device tree ● Data relocation tree OSDC 2011 34
  • 35. BTRFS – internals: structures● 3 structures ● Key – index of the tree structure ● Block header – ID of file system – Reference of insert time – Level position ● Item – Different types: inodes, extents, directories OSDC 2011 35
  • 36. BTRFS – internals: the key● Index of the tree structure● Size: 136 bit● First 64 bit: unique object ID● Next 8 bit: type/item● Last 64 bit: item dependent ● e.g. Hash of directory name ● e.g. Number of elements in directory ● e.g. object ID of upper layer directory OSDC 2011 36
  • 37. BTRFS – internals: the item● More than one item per object ID possible Item Value INODE_ITEM 1 XATTR_ITEM 24 DIR_ITEM 84 DIR_INDEX 96 EXTENT_DATA 108 EXTENT_CSUM 128 ROOT_ITEM 132 EXTENT_ITEM 168 OSDC 2011 37
  • 38. BTRFS – more about trees● Highest layer ● Root tree ● Referenced in superblock ● Other trees => object ID in root tree● Some trees unique ● Extent allocation ● Data relocation● Possibly multiple trees ● File system OSDC 2011 38
  • 39. BTRFS – file system tree● Visible part● Contains: ● Inode items ● Reference items● No data of files ● See extents ● Exception: small files OSDC 2011 39
  • 40. BTRFS – extent allocation tree● Space management● Backward reference ● file system object ● Possibly multiple per extent ● Maybe move to extent data reference object OSDC 2011 40
  • 41. BTRFS – other trees● Log tree ● Collects fsync() calls ● Journal of this kind of COW calls● Checksum tree ● CRC32 checksums of data and meta data● Chunk tree ● Manage devices: device item and chunk map item● Device tree ● Counterpart of chunk tree OSDC 2011 41
  • 42. BTRFS – device management● Included volume manager● pool concept● RAID-0 and RAID-1 ● For data and meta data ● Not necessarily identical● Chunk tree ● abstract from disk block OSDC 2011 42
  • 43. BTRFS – extents, chunks, blocks OSDC 2011 43
  • 44. BTRFS – what else● Transparent compression via zlib● Support of POSIX ACLs● Online grow/shrink● Online add/removal of disks● No fsck() tool (yet)● Management tool evolution (btrfsctl -> btrfs) OSDC 2011 44
  • 45. BTRFS – migration I● Via tool btrfs-convert● du/df not fully BTRFS-aware● In place from ext3/4 ● Via libe2fs ● BTRFS meta data location flexible ● Old ext3/4 organized in snapshot ● Roll-back possible to date/time of conversion OSDC 2011 45
  • 46. BTRFS – migration II OSDC 2011 46
  • 47. BTRFS summary● Still experimental● Meets standard file systems requirements● Bridges existing gaps ● e.g. snapshots● easy migration from ext3/4 possible● New approach to storage management ● e.g. included volume manager OSDC 2011 47
  • 48. Summary● Improvement moving to ext4● Safe switching to ext4● In place migration from ext3 possible● Future is BTRFS● In place migration from ext3/4 to BTRFS possible OSDC 2011 48
  • 49. References● http://ext4.wiki.kernel.org● http://btrfs.wiki.kernel.org OSDC 2011 49
  • 50. Thank you! OSDC 2011 50

×