ext2-110628041727-phpapp02

Why we need ext4?
Robin Dong <sanbai@taobao.com>

ext2 global layout
Image from: http://learn.akae.cn/media/ch29s02.html

ext2 global layout
Super Block (1 block)
GDT (multi blocks)
Block Bitmap (1 block)
Inode Bitmap (1 block)
Inode table (multi blocks)

Super-block and GDT are vital, therefore
other groups will store their copies.
If mkfs with “sparse_super”(default) not
all groups have the copy of super block
and GDT, only Group
0,1,3,5,7,32
,52
,72
,33
,53
,73
....have it.
ext2 global layout

There is a structure called
“Reserved GDT”
which is putted after GDT and before
Block-bitmap, it is also a large file.
It is used for “resize” feature which
could expand the size of whole
filesystem.

ext2 file layout
Image from: http://e2fsprogs.sourceforge.net/ext2intro.html

The ext2 directory layout is just like
regular file, but the content of its
data block is stored by
“struct ext2_dir_entry”
ext2 directory layout
Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html

The length of ext2_dir_entry is
obviously different, so when users try
to find a file in directory, ext2 have to
check filename one by one. (It can't
use some algorithm like binary-
search)
If there is a large number of files in a
directory, searching operation will be
inefficent.
ext2 directory layout

ext2 directory remove
Image from: http://blog.csdn.net/anghlq/archive/2011/05/17/6427052.aspx

ext2 directory pack
● e2fsck -D
● Optimize directories in filesystem. This
option causes e2fsck to try to optimize all
directories, either by reindexing them if
the filesystem supports directory
indexing, or by sorting and compressing
directories for smaller directories, or for
filesystems using traditional linear
directories.

● Regular Symlink: link path is stored in
data block
● Fast Symlink: link path is stored in
inode (if link path is smaller than 56
bytes)
ext2 symlink

ext2 symlink
Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html

ext2 xattr
● * +--------------------+
●
* | header |
●
* | entry 1 | |
● * | entry 2 | | growing downwards
●
* | entry 3 | v
● * | four null bytes |
● * | . . . |
●
* | value 1 | ^
● * | value 3 | | growing upwards
●
* | value 2 | |
● * +--------------------+

ext2: badblock
● e2fsck use program “badblocks” to
detect bad blocks and mark these
blocks as “used” in block bitmap.
● If meta-data is in bad blocks,e2fsck
will try to allocate new block for it.

enhane of ext3
● Journal
ext3 could be looked like an ext2
filesystem with a journal file
● dir_index
more efficent directory-searching

ext3: journal
● ext2 filesystem may corrupt after reboot from
exception like power reset directly.
● Journal will ensure filesystem consistent or
recovery filesystem on system boot.
● Journal mode
● Writeback
● Ordered
● Journal

ext3: dir_index
● Compute hash
value of
ext3_dir_entry
● Find dx_entry
against hash value
in root block by
binary-search
● Find ext3_dir_entry
in leaf block one by
one

ext3: dir_index
● Advantage: dir_index could have no more
than two level indexs , therefore finding a file
in directory needs to read 3 blocks at most.
Imaging an ext3 filesystem with 4K block
size, a directory could contain about 5 million
files (file name is 100 bytes)
● Disadvantage: when add files to a directory,
the b-tree will split, but after deleting files,
the b-tree will not merge.
A directory with a few files will occupy many
blocks.

ext3 xattr
● Put xattr into inode.
● Less IO
● mkfs.ext3 -I 256 /dev/sda

limits of ext2/ext3
Block Size Max file size Max filesystem size
1KB 16GB 2TB
2KB 256GB 8TB
4KB 2TB 16TB
8KB (ppc arch) 2TB 32TB
● Read data from the indirect block of a
file will make extra IO

ext4
● ext4 inherits all the features of
ext2/ext3
● Larger filesystem
Max file size: 16TB
Max filesystem size: 1EB(1048576TB)

ext4: meta_bg
Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/

ext4: meta_bg
● Group Descriptor size is 64 bytes
● Imaging an ext4 filesystem with block_size =
1K
1K/64 = 16
a meta group will contain 16 groups.
The meta-GDT(1 block) will be put in Group
0, Group1, Group15
Group 16, Group17, Group31
Group 32, Group33, Group63
…...

ext4: flex_bg
● Merge Block-Bitmap/Inode-
Bitmap/Inode-table to Group 0
● The position of Super-block and GDT
follow the rule of “sparse”
● Advantage: save the space of Group
1,Group 2,Group 3 (especially for the
extent of ext4)

ext4: uninit_bg
● mkfs.ext4 -O uninit_bg
● Create a filesystem without initializing all of
the block groups. This feature also
enables checksums and highest-inode-used
statistics in each blockgroup. This feature
can speed up filesystem creation time
noticeably (if lazy_itable_init is enabled), and
can also reduce e2fsck time dramatically.

ext4: uninit_bg
● When init block-group?
● lazy_itable_init run
● ext4_new_inode →
ext4_read_block_bitmap

ext4: extent
Image from:
http://www.ibm.com/developerworks/cn/linux/l-cn-
filesrc5/

ext4: extent
● An ext4_extent could point to 128MB
continuious space.
● Example: a 300G file in ext3 will
occupied 300MB meta-data-blocks,
but in ext4 it only occupuied 36KB

ext4: delay allocation
● It consists of delaying block allocation
until the data is going to be written to
the disk
● This improves performance and
reduces fragmentation by improving
block allocation decisions based on
the actual file size

ext2-110628041727-phpapp02

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ext2-110628041727-phpapp02

Similar to ext2-110628041727-phpapp02 (20)

More from Hao(Robin) Dong

More from Hao(Robin) Dong (8)

ext2-110628041727-phpapp02