• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
why we need ext4

why we need ext4



why we need ext4

why we need ext4



Total Views
Views on SlideShare
Embed Views



14 Embeds 2,918

http://donghao.org 2596
http://www.trucy.org 175
http://www.donghao.org 47
http://xianguo.com 26
http://cache.baidu.com 25
http://ashliu.sinaapp.com 17
http://reader.youdao.com 10
http://webcache.googleusercontent.com 8
http://translate.googleusercontent.com 5
http://static.slidesharecdn.com 5
http://old.xianguo.com 1
url_unknown 1
http://itindex.net 1
http://www.zhuaxia.com 1



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • good.in some of the slides the bottom portions are not clear or incomplete
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    why we need ext4 why we need ext4 Presentation Transcript

    • Why we need ext4? Robin Dong <sanbai@taobao.com>
    • ext2 global layout Image from: http://learn.akae.cn/media/ch29s02.html
    • ext2 global layout Super Block (1 block) GDT (multi blocks) Block Bitmap (1 block) Inode Bitmap (1 block) Inode table (multi blocks)
    • Super-block and GDT are vital, therefore other groups will store their copies. If mkfs with “sparse_super”(default) not all groups have the copy of super block and GDT, only Group 0,1,3,5,7,3 2 ,5 2 ,7 2 ,3 3 ,5 3 ,7 3 ....have it. ext2 global layout
    • There is a structure called “Reserved GDT” which is putted after GDT and before Block-bitmap, it is also a large file. It is used for “resize” feature which could expand the size of whole filesystem.
    • ext2 file layout Image from: http://e2fsprogs.sourceforge.net/ext2intro.html
      • The ext2 directory layout is just like regular file, but the content of its data block is stored by “struct ext2_dir_entry”
      ext2 directory layout Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
      • The length of ext2_dir_entry is obviously different, so when users try to find a file in directory, ext2 have to check filename one by one. (It can't use some algorithm like binary-search) If there is a large number of files in a directory, searching operation will be inefficent.
      ext2 directory layout
    • ext2 directory remove Image from: http://blog.csdn.net/anghlq/archive/2011/05/17/6427052.aspx
    • ext2 directory pack
      • e2fsck -D
      • Optimize directories in filesystem. This option causes e2fsck to try to optimize all directories, either by reindexing them if the filesystem supports directory indexing, or by sorting and compressing directories for smaller directories, or for filesystems using traditional linear directories.
      • Regular Symlink: link path is stored in data block
      • Fast Symlink: link path is stored in inode (if link path is smaller than 56 bytes)
      ext2 symlink
    • ext2 symlink Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
    • ext2 hard link
    • ext2 xattr
    • ext2 xattr
      • * +--------------------+
      • * | header |
      • * | entry 1 | |
      • * | entry 2 | | growing downwards
      • * | entry 3 | v
      • * | four null bytes |
      • * | . . . |
      • * | value 1 | ^
      • * | value 3 | | growing upwards
      • * | value 2 | |
      • * +--------------------+
    • ext2: badblock
      • e2fsck use program “badblocks” to detect bad blocks and mark these blocks as “used” in block bitmap.
      • If meta-data is in bad blocks,e2fsck will try to allocate new block for it.
    • enhane of ext3
      • Journal
      • ext3 could be looked like an ext2 filesystem with a journal file
      • dir_index
      • more efficent directory-searching
    • ext3: journal
      • ext2 filesystem may corrupt after reboot from exception like power reset directly.
      • Journal will ensure filesystem consistent or recovery filesystem on system boot.
      • Journal mode
        • Writeback
        • Ordered
        • Journal
    • ext3: dir_index
        • Compute hash value of ext3_dir_entry
        • Find dx_entry against hash value in root block by binary-search
        • Find ext3_dir_entry in leaf block one by one
    • ext3: dir_index
      • Advantage: dir_index could have no more than two level indexs , therefore finding a file in directory needs to read 3 blocks at most.
      • Imaging an ext3 filesystem with 4K block size, a directory could contain about 5 million files (file name is 100 bytes)
      • Disadvantage: when add files to a directory, the b-tree will split, but after deleting files, the b-tree will not merge.
      • A directory with a few files will occupy many blocks.
    • ext3 xattr
      • Put xattr into inode.
      • Less IO
      • mkfs.ext3 -I 256 /dev/sda
    • limits of ext2/ext3 Block Size Max file size Max filesystem size 1KB 16GB 2TB 2KB 256GB 8TB 4KB 2TB 16TB 8KB (ppc arch) 2TB 32TB
      • Read data from the indirect block of a file will make extra IO
    • ext4
      • ext4 inherits all the features of ext2/ext3
      • Larger filesystem
      • Max file size: 16TB
      • Max filesystem size: 1EB(1048576TB)
    • ext4: meta_bg Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/
    • ext4: meta_bg
      • Group Descriptor size is 64 bytes
      • Imaging an ext4 filesystem with block_size = 1K
      • 1K/64 = 16
      • a meta group will contain 16 groups.
      • The meta-GDT(1 block) will be put in Group 0, Group1, Group15
      • Group 16, Group17, Group31
      • Group 32, Group33, Group63
      • …...
    • ext4: flex_bg
    • ext4: flex_bg
      • Merge Block-Bitmap/Inode-Bitmap/Inode-table to Group 0
      • The position of Super-block and GDT follow the rule of “sparse”
      • Advantage: save the space of Group 1,Group 2,Group 3 (especially for the extent of ext4)
    • ext4: uninit_bg
      • mkfs.ext4 -O uninit_bg
      • Create a filesystem without initializing all of the block groups. This feature also enables checksums and highest-inode-used statistics in each blockgroup. This feature can speed up filesystem creation time noticeably (if lazy_itable_init is enabled), and can also reduce e2fsck time dramatically.
    • ext4: uninit_bg
      • When init block-group?
        • lazy_itable_init run
        • ext4_new_inode -> ext4_read_block_bitmap
    • ext4: extent Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/
    • ext4: extent
      • An ext4_extent could point to 128MB continuious space.
      • Example: a 300G file in ext3 will occupied 300MB meta-data-blocks, but in ext4 it only occupuied 36KB
    • ext4: delay allocation
      • It consists of delaying block allocation until the data is going to be written to the disk
      • This improves performance and reduces fragmentation by improving block allocation decisions based on the actual file size
    • Q & A Thanks!