ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "

  1. 1. Effect of Disk Prefetching of Guest OS on Storage Deduplication Kuniyasu Suzaki †, Toshiki Yagi †, Kengo Iijima †, Cyrille Artho †, Yoshihito W t b Y hihit Watanabe †† † Research Center for Information Security †† 1
  2. 2. Motivation (1/2)• Normal OS is installed on fully virtualized environment and assumes there are real devices.• Do the optimization techniques of operating system work well for virtual devices? – Virtualized devices are developed to get native performance, but most virtual devices have their original restrictions which are not hidden from the view of performance.• Should Guest OS adjust the virtual devices with traditional optimization techniques? 2
  3. 3. Motivation (2/2)• Our approach is not to devlop a para-virtualized device driver and I/O Passthrough.• Our approach – Guest OS recognizes the feature of virtual device and adjust the behavior for it. • Current OS has many optimization techniques and tools. 3
  4. 4. Our targets• virtual device (storage) – CAS: Content Addressable Storage • Manage virtual block device with deduplication. • CAS has original restrictions; Occupancy problem, size mismatching, and alignment problem.• G t OS: Li Guest OS Linux – readahead: Disk prefetch mechanism in Linux kernel – System call “readahead” is different function. – block reallocation of file system • A kind of defrag tool. We developed “ext-optimizer” which reallocate data block using access profile. 4
  5. 5. CAS: Content addressable Storage• Data is not addressed by its physical location. Data is addressed by a unique name (a secure hash is used usually) derived from the content.• Same contents are expressed by one original content (same hash) and others are addressed by indirect link. (Storage Deduplication) – Plan9 has Venti [USENIX FAST02] – Data Domain (EMC) Deduplication [USENIX FAST08] ( ) p [ ] – LBCAS (Loopback Content Addressable Storage) [LinuxSymp09] Virtual Disk CAS Storage Archive Indexing Address SHA-1 0000000-0003FFF 4ad36ffe8… 0004000-0007FFF 974daf34a… New block 0008000-000BFFF 2d34ff3e1… is created 000C000-000FFFF 974daf34a… … … with new SHA-1 sharing Deduplication
  6. 6. Optimization for Disk Access• Disk prefetch “readahead” – Linux kernel has a disk prefetch mechanism called “readahead”. Prefeached data are stored in memory (page cache). The coverage size of prefeatch is changed dynamically by the hit rate of page chache.• System Call “readahead” – It is not directly related to the disk prefetch but it achieves same function from user space. – System Call “readahead” populates the page cache with whole data from a file. Thus, whole data of a file is stored at page cache. • It is not efficient for the view of prefeatch. – We refer this function “u-readahead” in this presentation. 6
  7. 7. Performance Issues on CAS• 2 types of block size mismatch (1) between File System and LBCAS (Static Mismatch) • ext2/3 4KB block size • LBCAS 64KB-512KB chunk size – Occupancy (Rate of necessary data in a LBCAS chunk) is low. » Kitagawa[LinuxKongress2006] reported the occupancy was 30% on KNOPPIX 3.8.2 on 256KB LBCAS. (2) between readahead and LBCAS (Dynamic Mismatch) • readahead 4KB-128KB coverage size • LBCAS 64KB-512KB chunk size – Size mismatch » Small readahead causes low occupancy. » Large readahead requires many LBCAS chunks for an access. – Alignment problem » When readahead covers the alignment of LBCAS, redundant chunk is required. 7
  8. 8. Access mismatch in chunk of LBCAS• Occupancy (necessary data in a chunk) depends on the necessary data.• Large readahead requires many chunks.• Wnen an access crosses over the LBCAS alignment, redundant chuck is allocated. Ext2/3 File System readahead LBCAS Access request (4K) (4K~128K) (256KB) Occupancy is low Small readahead Many chunk Large readahead searches and allocation for an access Alignment Access Redundant chunk 8 Files Block search Disk access LBCAS Chunk via readahead
  9. 9. Solution1. (for static mismatch) Increase occupancy by reallocate necessary data in a LBCAS chunk.2. (for dynamic mismatch) Keeps large coverage size of readahead by sequential access and high hit rate of page cache.• Increasing locality of reference.• “ext-optimizer” repacks the data blocks of ext2/3 file system to be in line. – The repacking is based on the block access profile. – As the results, ext-optimizer increases the occupancy and constant high cache hit rate by sequential access. 9
  10. 10. Ext-optimizer: Access profile and reallocation App ext-optimizer App User Access Profile Kernel (via /proc/ ) VFS VFS File System Driver (ext2/3) File System Driver (ext2/3) Profiler Page Cache (Memory) Page Cache (Memory) Readahead issmall and many Readahead (worm-eaten) is sequential Block Driver Block Driver access access (Loopback) (Loopback) Device Reallocate 10 scattered gathered
  11. 11. Block Relocation: Ext-optimizer [LinuxKongress06]• Change data blocks to be arranged in line. Structure of meta data is not changed.• The arrangement is based on the access profile.• Feature: – Normal driver is used. – The fragmentation is occurred from the view of file – The relocation increases page-cache hit. readahead extend the coverage size. Mode Mode Owner info Owner info Size Size high Timestamps Timestamps occupancy Direct Blocks Direct Blocks Indirect Blocks Indirect Blocks Double Indirect Double Indirect Triple Indirect Triple Indirect 11
  12. 12. Performance Analysis• Confirm the effect of ext-optimizer on LBCAS for Guest OS booting. – Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60. • The ext3 was optimized by ext-optimizer for boot profile. • The disk image is translated to LBCAS (64KB - 256KB).• Compare with – Normal – u-readahead: user level readahead (system call) for booting – ext-optimizer 12
  13. 13. Disk Image Analyzed by DAVL (Disk Allocation Viewer for Linux) Fragmentation 0.21% Fragmentation 1.11% Data usedSystem booting are block made in line Non Non- contiguous block contiguous block 13 normal ext2/3opt
  14. 14. Disk Access Trace at boot time• Ext-optimizer relocate data blocks, which are required at boot time, at the top of virtual disk.Red: normalBlue: ext2/3opt s) Time (s 0 2.0 4.0 6.0 8.0 14 Address (GB)
  15. 15. Histogram of Access for readahead coverage• Ext-optimizer reduced small “readahead”. Frequency 0 32 64 128 15 Coverage size of readahead (KB)
  16. 16. Amount of data on each processing level normal u-readahead ext2/3opt Amount of files (number, average) 203MB (2,248 Av: 92KB) Amount of required blocks 127MB Amount of disk access which 208MB 231MB 140MB includes coverage of readahead 6,379 5, 827 2,129 (count, average coverage size) 33KB 41KB 67KBAmount of required chunk MB, Occupancy % (127MB/ Amount of Chunk MB)LBCAS size normal u-readahead ext2/3opt64KB 247, 51.5% 272, 46.9% 144, 88.7%128KB 290, 43.9% 315, 40.3% 149, 85.3%256KB 358, 35.5% 386, 35.0% 159, 80.0%512KB 474, 26.9% 508, 25.1% 176, 71.8% 16
  17. 17. Discussion• In this talk, I eliminate the effect of deduplication, but it is not high on a single disk image, even if the chuck is small. – Deduplicaion is effective on merging updated images. – Performance is more important.• Memory on a virtual machine also has deduplication mechanism (Differential Engine[OSDI’09], Satori[USENIX’09], etc). Guest OS should adjust the behavior. – SLINKY[USENIX05] and our paper [HotSec10] utilizes memory deduplication for security. 17
  18. 18. Conclusion• Virtual devices have their original restrictions which are not hidden from the view of performance.• The guest OS should recognize the feature of virtual device and adjust the behavior for virtual device with traditional ti i ti t h i t diti l optimization techniques.• We showed an example for CAS(Content Addressable Storage) with disk prefeatching and block reallocation. 18