Effect of readahead and file system block reallocation
for LBCAS (LoopBack Content Addressable Storage)

                 ...
Key words

• LBCAS: Loopback Content Addressable Storage
   – Virtual block device (network transparent block device)
• re...
Today’s Contents
• Motivation
    – What is LBCAS used for?
    – Correlation among LBCAS, file system block reallocation ...
Motivation
              What is “LBCAS” used for?
• LBCAS is developed for OS Circular.
• OS Circular is a project to dis...
OS Circular (Big Picture)

      OS Suppliers
      (update timely)




                                                  ...
Performance Issues (Today’s Main Topic)
• LBCAS is sensitive for access patterns.
     – Performance is affected by Number...
LBCAS: LoopBack Content Addressable Storage
• LBCAS= CAS + LoopBack
   – CAS
      • Indirect addressing by SHA-1 digest o...
Block files of LBCAS
                                               Address             File Name
                        ...
LBCAS (1/2)
• The image of LBCAS are made from existing
  normal block device.
• Original block device is split by fixed s...
Construct a virtual disk of LBCAS on a Client PC

                                          OS




                       ...
Structure of LBCAS
• Storage Cache
   – Suppress download
• Memory Cache
   – Suppress disk-access and
     uncompress



...
LBCAS (2/2)
• When a file is updated or created on the original block device, the
  relevant block files are newly created...
Partial Update of LBCAS
                    Block Device           block file
                                           b...
Performance Issues
• LBCAS is sensitive for access patterns.
• 2 types of block size mismatch
   (1) between File System a...
CAUTION for readahead

• Disk prefetch “readahead” and System Call “readahead”
   – System Call “readahead” populates the ...
Block size mismatch
• Solution (increasing locality of reference)
   1. (for static mismatch) Increase occupancy by reallo...
Occupancy in a block file of LBCAS
•   Occupancy (necessary data in a block file) depends on the necessary data.
•   “Worn...
Readahead and LBCAS 1/2
• Readahead is a mechanism of disk prefetch. The data are saved to page cache.
• The coverage size...
Readahead and LBCAS 2/2
•   When a readahead is issued, a part of block file is required and mapped to the virtual disk.
 ...
Readahead and Block Reallocation

• Readahead can be improved by block reallocation of
  File System, if the hit rate of p...
Access profile and reallocation

                           App                           ext2/3optimizer                 ...
Block Relocation: Ext2/3optimizer [LinuxKongress06]
• Change data blocks to be arranged in line. Structure of meta data is...
Performance Analysis

• Confirm effect of ext2/3optimizer on LBCAS for booting.
   – Ubuntu 9.04 (2.6.28) installed on ext...
Static Analyze by DAVL (Disk Allocation Viewer for Linux)


         Fragmentation 0.21%            Fragmentation 1.11%


...
Utilization of I/O
• BootChart showed utilization of I/O.
   – u-readahead caused spike of I/O.


         normal         ...
Dynamic Analyze: Disk Access at boot time
• Ext2/3optimizer relocate data blocks, which are
  required at boot time, at th...
Trace of readahead coverage size
            128KB


normal
            64KB


            32KB


             0KB
       ...
Frequency for each readahead coverage
• Ext2/3 optimizer reduced small “readahead”.
        Frequency




                ...
Volume Transition on processing level
                                   normal         u-readahead       ext2/3opt
 Volum...
Consumed time in LBCAS

Time (s)

            43    43     42    37     43    43    45    38          45     45     46    ...
Total download of LBCAS
• Ext2/3opt reduced the necessary block files (256KB).

                        140


            ...
I/O Requests are
 independent of
    LBCAS
                           Frequency of function in LBCAS
normal             Re...
Discussions
• Weak point of ext2/3optimizer
   – The reallocation is customized for booting. The other
     applications m...
Conclusions
• “ext2/3optimzer” is a strong tool to utilize “readahead”,
  because it reallocates data blocks which are use...
Summary

The some services are available. Just try!
 http://openlab.jp/oscircular/


  EXT2/3optimizer developers
      ht...
Upcoming SlideShare
Loading in …5
×

Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

1,552 views

Published on

Linux Symposium 2009 Slide Suzaki
"Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

http://www.linuxsymposium.org/2009/

Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,552
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

  1. 1. Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage) @ Linux Symposium 2009, Montreal, Canada, 17/July Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf Kuniyasu Suzaki †, Toshiki Yagi †, Kengo Iijima †, Nguyen Anh Quynh †, Yoshihito Watanabe †† † Research Center for Information Security †† 1
  2. 2. Key words • LBCAS: Loopback Content Addressable Storage – Virtual block device (network transparent block device) • readahead – Disk prefetch mechanism in Linux kernel • System call “readahead” is different function. • file system block reallocation – A kind of defrag tool – We developed “ext2/3optimizer” which reallocate i-node data block. Today’s talk is optimization methods using them. 2
  3. 3. Today’s Contents • Motivation – What is LBCAS used for? – Correlation among LBCAS, file system block reallocation (ext2/3optimizer), and disk prefetch (readahead) • LBCAS: Loopback Content Addressable Storage • Optimization: ext2/3optimizer and readahead • Performance Results • Conclusions 3
  4. 4. Motivation What is “LBCAS” used for? • LBCAS is developed for OS Circular. • OS Circular is a project to distribute bootable disk image for virtual machine and real machine. – OS Circular project • http://openlab.jp/oscircular/ 4
  5. 5. OS Circular (Big Picture) OS Suppliers (update timely) block files on LBCAS HTTP Server (Loopback Content Internet Addressable Storage) Construct Virtual Disk from block files KVM QEMU Users Try OS without installation 5 Virtual Machine Real Machine
  6. 6. Performance Issues (Today’s Main Topic) • LBCAS is sensitive for access patterns. – Performance is affected by Number and Size of Disk Prefetch (“readahead” of Linux kernel) • Number and Size of readahead can be optimized by file system block reallocation. – Defrag Tools are not enough. We developed “ext2/3optimzer”. ext2/3optimizer •Number of readahead reallocates blocks of is reduced Performance of ext2/3, which is based LBCAS is increased •Size of readahead is on access profile. extended General Technique Presentation ③ ② ① Order 6
  7. 7. LBCAS: LoopBack Content Addressable Storage • LBCAS= CAS + LoopBack – CAS • Indirect addressing by SHA-1 digest of block contents • Benefit: Same blocks are expressed by same SHA-1 digest and reduced total storage • Mainly used for Archive. Example: Venti of Plan9 [USENIX FAST’02] – LoopBack • Virtual block device. A file is used as a block device. • The abstraction by file makes easy to treat. • LBCAS saves each block to a file, which is called “block file”. The file is named by SHA-1 digest of its contents. • Block files are managed by “mapping table” file, which is a table of physical address and SHA-1 file name. 7
  8. 8. Block files of LBCAS Address File Name 00000000-0003FFFF 4ad36ffe8… 00040000-0007FFFF 974daf34a… 00080000-000BFFFF 2d34ff3e1… Block Device 000C0000-000FFFFF 3310012a… Mapping Table and … … block files 4KB Page map01.idx 4ad36ffe8… ext2 256KB 974daf34a… … 2d34ff3e1… The block files are re- 3310012a… constructed as a virtual disk … … … with LBCAS Block file is named by SHA-1 digest of its contents … compressed … by zlib 8
  9. 9. LBCAS (1/2) • The image of LBCAS are made from existing normal block device. • Original block device is split by fixed size (64KB - 512KB) and compressed by zlib. • Block files are reconstructed to a loopback file by FUSE wrapper. – FUSE is a User-land File System. • http://fuse.sf.net • Each block file is measured with the SHA1 file name when it mapped to loopback file. 9
  10. 10. Construct a virtual disk of LBCAS on a Client PC OS 10
  11. 11. Structure of LBCAS • Storage Cache – Suppress download • Memory Cache – Suppress disk-access and uncompress 11
  12. 12. LBCAS (2/2) • When a file is updated or created on the original block device, the relevant block files are newly created with new SHA1 file name. The mapping table file is also renewed. – Old block files are reusable. • HTTP for file deliver – Most popular and well designed for Internet. • Utilize inexpensive Web hosting services, Proxies, and Mirror Servers for world wide deployment. • Block files are network/storage transparent. – If necessary block files are stored in a local storage, network connection is not necessary. 12
  13. 13. Partial Update of LBCAS Block Device block file block files named by SHA-1 4KB Page map01.idx ext2 256KB 4ad36ffe8… 974daf34a… … 2d34ff3e1… 3310012a… … … … Same files … Reusable for FUSE Update 4KB Page map02.idx 256KB 4ad36ffe8… ext2 FUSE dd4daf34a… driver … 2d34ff3e1… 3310012a… … … … apt-get install … Create Once, Use Many … 13
  14. 14. Performance Issues • LBCAS is sensitive for access patterns. • 2 types of block size mismatch (1) between File System and LBCAS (Static Mismatch) • ext2/3 4KB block size • LBCAS 64KB-512KB block size – Occupancy (Rate of necessary data in a block file) is low. » Kitagawa[LinuxKongress2006] reported the occupancy was 30% on KNOPPIX 3.8.2 on 256KB LBCAS. (2) between “readahead(disk prefetch)” and LBCAS (Dynamic Mismatch) • readahead 4KB-128KB coverage size • LBCAS 64KB-512KB lock size – Small and many access (worm-eaten access to a block file) causes redundant download and unnecessary uncompress for LBCAS Driver. 14
  15. 15. CAUTION for readahead • Disk prefetch “readahead” and System Call “readahead” – System Call “readahead” populates the page cache with data from a file. Thus, whole data of a file is stored at page cache. The coverage is size of a file. – It is not directly related to the disk prefetch but it achieves same function from user space. – Some boot procedure use the system call “readahead”. The files, which are populated the page cache at boot time in advance, are listed at “/etc/readahead/boot,desktop”. We call this function “u-readahead” in this presentation. 15
  16. 16. Block size mismatch • Solution (increasing locality of reference) 1. (for static mismatch) Increase occupancy by reallocate necessary data in a block file. 2. (for dynamic mismatch) Extend the coverage size of readahead by sequential access and high hit rate of page cache. • “ext2/3optimizer” repacks the data blocks of ext2/3 file system to be in line. – The repacking is based on the block access profile at boot time. – As the results, ext2/3optimizer reduces the number of block files. 16
  17. 17. Occupancy in a block file of LBCAS • Occupancy (necessary data in a block file) depends on the necessary data. • “Worn-eaten” access (readahead) causes redundant download of block file. Ext2/3 File System readahead LBCAS Read Order (4K) (4K~128K) (256KB) ① ② Hit Page-Cache Occupancy is low ③ Cache missed and the coverage is shrunk Redundant block 17 Files Block search Disk access Block files via readahead downloaded
  18. 18. Readahead and LBCAS 1/2 • Readahead is a mechanism of disk prefetch. The data are saved to page cache. • The coverage size is extended or shrank by the rate of page cache hit rate. start ahead_start I/O current_window ahead_window Extend to “max_readahead” sequential read from application I/O current_window ahead_window sequential read 18 from application
  19. 19. Readahead and LBCAS 2/2 • When a readahead is issued, a part of block file is required and mapped to the virtual disk. The size depends on the coverage size of readahead. – Wide readahead is effective for LBCAS driver. • When a same block file is required sequentially, the block file is stored on the memory cache of LBCAS and the uncompression is eliminated. D3E14… Download block files Map LBCAS to loopback device start ahead_start Low occupancy caused I/O size mismatch current_window ahead_window Extend to “max_readahead” sequential read from application 3B441… Stored D3E14… Memory cache LBCAS I/O current_window ahead_window sequential read 19 from application
  20. 20. Readahead and Block Reallocation • Readahead can be improved by block reallocation of File System, if the hit rate of page cache is increased. • Defrag tools looks work well … – Unfortunately, current defrag tools are not suitable, because they are developed from the view of file defrag. • We developed “ext2/3optimizer” which reallocate the data blocks of ext2/3 based on access profile. – It also increases occupancy in a block file. 20
  21. 21. Access profile and reallocation App ext2/3optimizer App User Access Profile Kernel (via /proc/ ) VFS VFS File System Driver (ext2/3) File System Driver (ext2/3) Profiler Page Cache (Memory) Page Cache (Memory) Readahead is small and many Readahead is sequential (worm-eaten) Block Driver Block Driver access access (Loopback) (Loopback) Device Reallocate 21 scattered gathered
  22. 22. Block Relocation: Ext2/3optimizer [LinuxKongress06] • Change data blocks to be arranged in line. Structure of meta data is not changed. • The arrangement is based on the access profile. • Feature: – Normal driver is used. – The fragmentation is occurred from the view of file – The relocation increases page-cache hit. readahead extend the coverage size. Mode Mode Owner info Owner info Size Size high Timestamps Timestamps readahead occupancy is widen Direct Blocks Direct Blocks Indirect Blocks Indirect Blocks Double Indirect Double Indirect Triple Indirect Triple Indirect 22
  23. 23. Performance Analysis • Confirm effect of ext2/3optimizer on LBCAS for booting. – Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60. • The ext3 was optimized by ext2/3optimizer for boot profile. • The disk image is translated to LBCAS (64KB - 512KB). • Compare with – Normal – u-readahead: user level readahead (system call) for booting – ext2/3optimzer 23
  24. 24. Static Analyze by DAVL (Disk Allocation Viewer for Linux) Fragmentation 0.21% Fragmentation 1.11% System block Non- contiguous block contiguous block 24 normal ext2/3opt
  25. 25. Utilization of I/O • BootChart showed utilization of I/O. – u-readahead caused spike of I/O. normal u-readahead ext2/3opt Reduced I/O I/O Spike 25
  26. 26. Dynamic Analyze: Disk Access at boot time • Ext2/3optimizer relocate data blocks, which are required at boot time, at the top of virtual disk. Red: normal Blue: ext2/3opt Time (s) 0 2.0 4.0 6.0 8.0 26 Address (GB)
  27. 27. Trace of readahead coverage size 128KB normal 64KB 32KB 0KB 0 10 20 30 40 50 60 128KB Time (s) u-hreadahead 64KB 32KB 0KB 0 10 20 30 40 50 60 128KB Time (s) Fewer small ext2/3opt readahead 64KB 32KB 27 0KB 0 10 20 30 40 50 60
  28. 28. Frequency for each readahead coverage • Ext2/3 optimizer reduced small “readahead”. Frequency 0 32 64 128 28 request size (KB)
  29. 29. Volume Transition on processing level normal u-readahead ext2/3opt Volume of files (number, average) 203MB (2,248 Av: 92KB) 76MB (67%) Volume of required blocks 127MB +81MB +104MB +13MB Volume of access which includes 208MB 231MB 1/2 140MB coverage of readahead (frequency, average size) freq:6,379 1/3 freq:5, 827 freq:2,129 size:33KB size:41KB 2 size:67KB • Volume of downloaded block files MB, (uncompressed MB), Occupancy % (127MB/ uncompressed MB) LBCAS size normal u-readahead ext2/3opt 64KB 86.1(247), 51.5% 93.4(272), 46.9% 55.3(144), 88.7% 128KB 96.8(290), 43.9% 104(315), 40.3% 55.3(149), 85.3% 256KB 114(358), 35.5% 123(386), 35.0% 55.6(159), 80.0% 512KB 144(474), 26.9% 153(508), 25.1% 55.6(176), 71.8% 29
  30. 30. Consumed time in LBCAS Time (s) 43 43 42 37 43 43 45 38 45 45 46 44 13 13 13 20 14 14 12 19 7 6 6 7 normal u-readahead ext2/3opt 512KB was not efficient on each optimization Time (s) 5.0 6.5 9.0 14.0 5.2 6.7 7.3 11.4 2.5 2.8 3.5 4.8 5.7 4.6 4.7 3.1 6.6 5.8 2.9 4.5 3.6 2.7 1.7 1.1 30 normal u-readahead ext2/3opt
  31. 31. Total download of LBCAS • Ext2/3opt reduced the necessary block files (256KB). 140 + normal 120 □ u-readahead × ext2/3opt 100 System call “readahead” downloaded Download (MB) required files in advance. It caused I/O spike. It also included redundant data. 80 60 40 20 Time (s) 31
  32. 32. I/O Requests are independent of LBCAS Frequency of function in LBCAS normal Requests (R) Download Storage Uncompress Memory Files per request (Av: 33KB) (D) Cache (U) Cache (M) R= ①+②+③ (S) D+S=U U+M=①+②*2+③*3 64KB 6,338 3,958 1,663 5,621 3,647 ① 4,148 ② 1,450 ③ 740 128KB 6,381 2,321 1,729 4,050 3,793 ① 4,919 ② 1,462 256KB 6,379 1,435 1,748 3,183 3,908 ① 5,667 ② 717 512KB 6,395 848 1,769 2,717 4,019 ① 6,054 ② 341 u-readahead (Av: 41KB) 64KB 5,825 4,344 1,172 5,516 3,626 ① 3,537 ② 1,259 ③ 1,029 128KB 5,834 2,526 1,200 3,726 3,761 ① 4,181 ② 1,653 256KB 5,827 1,544 1,179 2,723 3.,908 ① 5,023 ② 804 512KB 5,822 1.015 1,172 2,187 4,023 ① 5,434 ② 388 download uncompress ext2/3opt (Av: 67KB) is reduced is reduced 64KB 2,165 2,296 626 2,922 1,311 ① 941 ② 380 ③ 844 128KB 2,148 1,189 593 1,782 1,398 ① 1,116 ② 1,032 256KB 2,129 634 576 1,210 1,409 ① 1,639 ② 490 512KB 2,132 353 517 870 1,520 ① 1,874 32 ② 258
  33. 33. Discussions • Weak point of ext2/3optimizer – The reallocation is customized for booting. The other applications may be subject to adverse effect. • I guess boot procedure is special and has no strong relation to other applications. – The reallocation is customized for a certain version. When a part of boot procedure is updated, we have to re-optimize the image. 33
  34. 34. Conclusions • “ext2/3optimzer” is a strong tool to utilize “readahead”, because it reallocates data blocks which are used by boot procedure. – It increased occupancy (rate of necessary data in a block file) of LBCAS block file. – It made the coverage of readahead double and reduced the number of readahead to half. • “ext2/3optimizer” is not for LBCAS. It is used for normal Linux Distributions. 34
  35. 35. Summary The some services are available. Just try! http://openlab.jp/oscircular/ EXT2/3optimizer developers http://unit.aist.go.jp/itri/knoppix/ext2optimizer/index-en.htm DAVL developers http://sourceforge.net/projects/davl/ BootChart http://www.bootchart.org/ 35

×