Your SlideShare is downloading. ×
0
Why we need ext4? Robin Dong <sanbai@taobao.com>
ext2 global layout Image from: http://learn.akae.cn/media/ch29s02.html
ext2 global layout Super Block (1 block) GDT (multi blocks) Block Bitmap (1 block) Inode Bitmap (1 block) Inode table (mul...
Super-block and GDT are vital, therefore other groups will store their copies. If mkfs with “sparse_super”(default) not al...
There is a structure called  “Reserved GDT”  which is putted after GDT and before Block-bitmap, it is also a large file. I...
ext2 file layout Image from: http://e2fsprogs.sourceforge.net/ext2intro.html
<ul>The ext2 directory layout is just like regular file, but the content of its data block is stored by “struct ext2_dir_e...
<ul>The length of ext2_dir_entry is obviously different, so when users try to find a file in directory, ext2 have to  chec...
ext2 directory remove Image from: http://blog.csdn.net/anghlq/archive/2011/05/17/6427052.aspx
ext2 directory pack <ul><li>e2fsck -D
Optimize directories in filesystem.  This option causes e2fsck to try to optimize all directories, either  by reindexing t...
<ul><li>Regular Symlink: link path is stored in data block
Fast Symlink: link path is stored in inode (if link path is smaller than 56 bytes) </li></ul>ext2 symlink
ext2 symlink Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
ext2 hard link
ext2 xattr
ext2 xattr <ul><li>*  +--------------------+
*  | header  |
*  | entry 1  |  |
*  | entry 2  |  | growing downwards
*  | entry 3  |  v
*  | four null bytes  |
*  | . . .  |
*  | value 1  | ^
*  | value 3  |  | growing upwards
*  | value 2  |  |
*  +--------------------+ </li></ul>
ext2: badblock <ul><li>e2fsck use program “badblocks” to detect bad blocks and mark these blocks as “used” in block bitmap.
If meta-data is in bad blocks,e2fsck will try to allocate new block for it. </li></ul>
enhane of ext3 <ul><li>Journal
ext3 could be looked like an ext2 filesystem with a journal file
dir_index
more efficent directory-searching </li></ul>
Upcoming SlideShare
Loading in...5
×

why we need ext4

6,543

Published on

why we need ext4

Published in: Technology
1 Comment
10 Likes
Statistics
Notes
  • good.in some of the slides the bottom portions are not clear or incomplete
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
6,543
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
192
Comments
1
Likes
10
Embeds 0
No embeds

No notes for slide

Transcript of "why we need ext4"

  1. 1. Why we need ext4? Robin Dong <sanbai@taobao.com>
  2. 2. ext2 global layout Image from: http://learn.akae.cn/media/ch29s02.html
  3. 3. ext2 global layout Super Block (1 block) GDT (multi blocks) Block Bitmap (1 block) Inode Bitmap (1 block) Inode table (multi blocks)
  4. 4. Super-block and GDT are vital, therefore other groups will store their copies. If mkfs with “sparse_super”(default) not all groups have the copy of super block and GDT, only Group 0,1,3,5,7,3 2 ,5 2 ,7 2 ,3 3 ,5 3 ,7 3 ....have it. ext2 global layout
  5. 5. There is a structure called “Reserved GDT” which is putted after GDT and before Block-bitmap, it is also a large file. It is used for “resize” feature which could expand the size of whole filesystem.
  6. 6. ext2 file layout Image from: http://e2fsprogs.sourceforge.net/ext2intro.html
  7. 7. <ul>The ext2 directory layout is just like regular file, but the content of its data block is stored by “struct ext2_dir_entry” </ul>ext2 directory layout Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
  8. 8. <ul>The length of ext2_dir_entry is obviously different, so when users try to find a file in directory, ext2 have to check filename one by one. (It can't use some algorithm like binary-search) If there is a large number of files in a directory, searching operation will be inefficent. </ul>ext2 directory layout
  9. 9. ext2 directory remove Image from: http://blog.csdn.net/anghlq/archive/2011/05/17/6427052.aspx
  10. 10. ext2 directory pack <ul><li>e2fsck -D
  11. 11. Optimize directories in filesystem. This option causes e2fsck to try to optimize all directories, either by reindexing them if the filesystem supports directory indexing, or by sorting and compressing directories for smaller directories, or for filesystems using traditional linear directories. </li></ul>
  12. 12. <ul><li>Regular Symlink: link path is stored in data block
  13. 13. Fast Symlink: link path is stored in inode (if link path is smaller than 56 bytes) </li></ul>ext2 symlink
  14. 14. ext2 symlink Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
  15. 15. ext2 hard link
  16. 16. ext2 xattr
  17. 17. ext2 xattr <ul><li>* +--------------------+
  18. 18. * | header |
  19. 19. * | entry 1 | |
  20. 20. * | entry 2 | | growing downwards
  21. 21. * | entry 3 | v
  22. 22. * | four null bytes |
  23. 23. * | . . . |
  24. 24. * | value 1 | ^
  25. 25. * | value 3 | | growing upwards
  26. 26. * | value 2 | |
  27. 27. * +--------------------+ </li></ul>
  28. 28. ext2: badblock <ul><li>e2fsck use program “badblocks” to detect bad blocks and mark these blocks as “used” in block bitmap.
  29. 29. If meta-data is in bad blocks,e2fsck will try to allocate new block for it. </li></ul>
  30. 30. enhane of ext3 <ul><li>Journal
  31. 31. ext3 could be looked like an ext2 filesystem with a journal file
  32. 32. dir_index
  33. 33. more efficent directory-searching </li></ul>
  34. 34. ext3: journal <ul><li>ext2 filesystem may corrupt after reboot from exception like power reset directly.
  35. 35. Journal will ensure filesystem consistent or recovery filesystem on system boot.
  36. 36. Journal mode </li><ul><li>Writeback
  37. 37. Ordered
  38. 38. Journal </li></ul></ul>
  39. 39. ext3: dir_index <ul><ul><li>Compute hash value of ext3_dir_entry
  40. 40. Find dx_entry against hash value in root block by binary-search
  41. 41. Find ext3_dir_entry in leaf block one by one </li></ul></ul>
  42. 42. ext3: dir_index <ul><li>Advantage: dir_index could have no more than two level indexs , therefore finding a file in directory needs to read 3 blocks at most.
  43. 43. Imaging an ext3 filesystem with 4K block size, a directory could contain about 5 million files (file name is 100 bytes)
  44. 44. Disadvantage: when add files to a directory, the b-tree will split, but after deleting files, the b-tree will not merge.
  45. 45. A directory with a few files will occupy many blocks. </li></ul>
  46. 46. ext3 xattr <ul><li>Put xattr into inode.
  47. 47. Less IO
  48. 48. mkfs.ext3 -I 256 /dev/sda </li></ul>
  49. 49. limits of ext2/ext3 Block Size Max file size Max filesystem size 1KB 16GB 2TB 2KB 256GB 8TB 4KB 2TB 16TB 8KB (ppc arch) 2TB 32TB <ul><li>Read data from the indirect block of a file will make extra IO </li></ul>
  50. 50. ext4 <ul><li>ext4 inherits all the features of ext2/ext3
  51. 51. Larger filesystem
  52. 52. Max file size: 16TB
  53. 53. Max filesystem size: 1EB(1048576TB) </li></ul>
  54. 54. ext4: meta_bg Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/
  55. 55. ext4: meta_bg <ul><li>Group Descriptor size is 64 bytes
  56. 56. Imaging an ext4 filesystem with block_size = 1K
  57. 57. 1K/64 = 16
  58. 58. a meta group will contain 16 groups.
  59. 59. The meta-GDT(1 block) will be put in Group 0, Group1, Group15
  60. 60. Group 16, Group17, Group31
  61. 61. Group 32, Group33, Group63
  62. 62. …... </li></ul>
  63. 63. ext4: flex_bg
  64. 64. ext4: flex_bg <ul><li>Merge Block-Bitmap/Inode-Bitmap/Inode-table to Group 0
  65. 65. The position of Super-block and GDT follow the rule of “sparse”
  66. 66. Advantage: save the space of Group 1,Group 2,Group 3 (especially for the extent of ext4) </li></ul>
  67. 67. ext4: uninit_bg <ul><li>mkfs.ext4 -O uninit_bg
  68. 68. Create a filesystem without initializing all of the block groups. This feature also enables checksums and highest-inode-used statistics in each blockgroup. This feature can speed up filesystem creation time noticeably (if lazy_itable_init is enabled), and can also reduce e2fsck time dramatically. </li></ul>
  69. 69. ext4: uninit_bg <ul><li>When init block-group? </li><ul><li>lazy_itable_init run
  70. 70. ext4_new_inode -> ext4_read_block_bitmap </li></ul></ul>
  71. 71. ext4: extent Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/
  72. 72. ext4: extent <ul><li>An ext4_extent could point to 128MB continuious space.
  73. 73. Example: a 300G file in ext3 will occupied 300MB meta-data-blocks, but in ext4 it only occupuied 36KB </li></ul>
  74. 74. ext4: delay allocation <ul><li>It consists of delaying block allocation until the data is going to be written to the disk
  75. 75. This improves performance and reduces fragmentation by improving block allocation decisions based on the actual file size </li></ul>
  76. 76. Q & A Thanks!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×