FRA: a flash-aware redundancy array of flash storage devices

431 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
431
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FRA: a flash-aware redundancy array of flash storage devices

  1. 1. FRA: A Flash-aware Redundancy Array of Flash Storage Devices Yangsup Lee Sanghyuk Jung Yong Ho Song Samsung Electronics. Co., Ltd. Department of Electronics Computer Engineering Korea Hanyang University, Korea yangsup.lee@samsung.com {shjung, yhsong}@enc.hanyang.ac.kr ABSTRACT of embedded and computer systems. Recently, flash-based storage Since flash memory has many attractive characteristics such as systems such as SSDs (Solid State Drives) have been substituting high performance, non-volatility, low power consumption and traditional hard disks in server systems as their performance shock resistance, it has been widely used as storage media in the improves continuously. However, there are many shortcomings in embedded and computer system environments. In the case of flash memory, which still limit the rapid proliferation of flash- reliability, however, there are many shortcomings in flash based storages: potentially high I/O latency due to erase-before- memory: potentially high I/O latency due to erase-before-write write and poor durability due to limited erase cycles. Flash and poor durability due to limited erase cycles. To overcome memory may contain bad blocks since the time of manufacture: if these problems, a RAID technique borrowed from storage data are written into such blocks, the reliability of the data is not technology based on hard disks is employed. In the RAID guaranteed. When a block has been erased beyond the cycles technology, multi-bit burst failures in the page, block or device allowed, it becomes unusable. The reliability is an important issue are easily detected and corrected so that the reliability can be that needs to be properly addressed in building flash-based significantly enhanced. However the existing RAID-5 scheme for storage systems. the flash-based storage has delayed response time for parity To overcome this problem, the Flash Translation Layer (FTL) of updating. To overcome this problem, we propose a novel flash-based storage systems implements many protective approach using a RAID technique in flash storage, called Flash- measures. One of them is to store an Error Correction Code aware Redundancy Array. In this approach, parity updates are (ECC) in the spare area of flash blocks. This is extra space per postponed so that they are not included in the critical path of read page for maintaining a variety of bookkeeping information. When and write operations. Instead, they are scheduled for when the a page is retrieved from the flash, it is accompanied by the ECC device becomes idle. For example, the proposed scheme shows a stored in the spare area. From the page, a new ECC is calculated 19% improvement in the average write response time, compared and compared against the retrieved one to detect and correct bit to other approaches. errors before the page is forwarded to the host. However, recent flash memory tends to yield more bit errors than such an ECC Categories and Subject Descriptors mechanism can handle as silicon technology evolves. B.3.2 [Memory Structures]: Mass Storage; D.4.2 [Operating Systems]: Garbage collection Another approach is to employ the RAID technique borrowed from storage technology based on hard disks. The rationale General Terms behind this approach is related to the organization of flash-based Management, Measurement, Performance, Reliability storage: many flash memories are accommodated into a single storage to implement high-capacity high-performance storage, Keywords which in turn increases the probability of failure. Ideally, when a Storage Systems, Flash Memory, Flash Translation Layer, RAID RAID technology is used in flash storage, multi-bit burst failure in a page, block or device is easily detected and corrected so that 1. INTRODUCTION the reliability can be significantly enhanced. Flash memory has many attractive characteristics such as high However, the implementation of RAID technology in flash performance, non-volatility, low power consumption and shock storage is not straightforward because of the unique resistance. It has been widely used as storage media in a variety characteristics of flash memory such as erase-before-write and asymmetric read/write performance. If not properly implemented, the RAID technique can significantly deteriorate the storage Permission to make digital or hard copies of all or part of this work for performance. In particular, whenever a page is updated, the other personal or classroom use is granted without fee provided that copies are pages need to read it to calculate a new parity page and the new not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy parity will be stored in the flash memory, which should be otherwise, or republish, to post on servers or to redistribute to lists, avoided if possible. requires prior specific permission and/or a fee. In this paper, we propose a novel approach of using RAID CODES+ISSS’09, October 11-16, 2009, Grenoble, France. Copyright 2009 ACM 978-1-60558-628-1/09/10...$10.00. technique in flash storage, called Flash-aware Redundancy Array 163
  2. 2. (FRA, in short). This approach allows page, block or device Table 1, traditional SLC (Single Level Cells) hold only one bit per failure to be recovered by using RAID-5 scheme, while incurring cell with 10 times longer erase cycles and 4 times higher write less timing overhead in read and write operations in the flash performance compared to MLC. Due to their low cost and high memory. In our approach, parity updates are postponed so that density, MLCs are often perferred by storage vendors. The they are not included in the critical path of read and write decrease of reliability necessitates the use of error detection and operations. Instead, they are scheduled for when the device correction techniques in MLC. becomes idle. Simulation results show that the proposed scheme improves the RAID-based flash storage by up to 44% compared 2.2 FTL (Flash Translation Layer) to other approaches. One of the reasons for having this FTL is a software layer that provides hard disk interfaces to upper performance benefit is that multiple delayed parity updates to the layer software (e.g. file systems) by dynamically associating same page can be coalesced into one before they are presented to logical sector addresses with physical locations in the flash device. the flash memory. There are three common features in FTL: address mapping, wear- The rest of this paper is organized as follows. In Section 2, we leveling and garbage collection. Address mapping is a feature that present the preliminaries of flash memory based storage translates a given logical sector address from the host to a technology including FTL and RAID schemes. Section 3 contrasts corresponding physical page address of the flash memory. In the related works and Section 4 describes the motivations of this case of write, the physically addressed page should be clean, i.e., paper. The organization and operation details of FRA are erased. There are three types of mapping scheme: block-level described in Section 5. The performance of the proposed mapping, page-level mapping and hybrid mapping. Among these, technique is compared against others in Section 6. And finally, hybrid mapping employs the logging technique to temporarily Section 8 concludes the paper. hold updates to user data in the flash memory, which contributes to good write performance and small mapping table size. Thus, it 2. BACKGROUND is especially useful in embedded systems. Some examples of such hybrid mapping include BAST (Block-Associative Sector 2.1 Flash Memory Translation) [9], FAST (Fully-Associative Sector Translation) [5], Flash storage is often compared with hard disks regarding many SAST (Set-Associative Sector Translation) [6], Superblock FTL important aspects. First, flash memory uses page as a read/write [10] and MC_SPLIT [7]. unit, which used to be equal to a sector (512 bytes) but is now 2 Flash memory uses a wear-leveling scheme to increase its life Kbytes or 4 Kbytes for most flash devices. The discrepancy of the time by equally erasing blocks in the flash memory as much as read/write unit along with no support for in-place updates results possible [11]. However, strict enforcement of wear-leveling may in the introduction of FTL to flash storage. This software layer is result in the unfruitful increase of erase count in the flash memory. responsible for mapping a logical sector address from a host to a physical location in the flash memory. Depending on the Garbage collection reclaims space by erasing obsolete blocks. It granularity of mapping operation, such mapping algorithms are chooses a victim block to reclaim, copies any valid pages to classified into page-level mapping, block-level mapping, and another and performs an erase operation on the block. This hybrid mapping [6][15]. (See Section 2.2 for details) operation prepares a clean block for further write operations so that a write can be finished without having to perform an Second, unlike hard disks, each page of flash memory consists of expensive erase before the write. Therefore, the garbage a data area to store user data and a spare area to be used to hold collection is closely connected with the write performance of metadata associated with the user data such as ECC, logical sector flash storage. addresses corresponding to the user data, etc. 2.3 RAID Third, flash memory does not support in-place updates. When a page is updated, it must be erased and then it can be re-written RAID (Redundant Arrays of Inexpensive Disks), based on the with the new update. Furthermore, the unit of erase operation is a magnetic disk technology developed for personal computers, block that consists of multiple pages, typically 64 or 128 as offers an attractive alternative to SLED (Single Large Expensive shown in Table 1. Therefore, when a block is erased, the valid Disk), promising improvements of an order of magnitude in pages in the block should be safely saved for further accesses. performance, reliability, power consumption, and scalability [3]. There are several levels of RAIDs, giving their relative cost, Table 1. Characteristics comparison SLC and MLC reliability and performance. Type Erase cycle Pages in a Block Program time 2.3.1 RAID-0 SLC 100,000 64 200us RAID-0 has been developed for building a large storage capacity MLC 10,000 128 800us using the arrays of inexpensive disks. In this approach, user data are interleaved over disks so that both disk capacity data transfer rates increase. Recently, RAID-0 has been mainly used for The attempt to increase the capacity of flash devices has achieving high performance, not a large capacity. introduced the technique of storing multiple bits in a cell. Such a However, this approach does not show reliability: when one of device is called a MLC (Multi-Level Cell) flash. However, the disks malfunctions, it is impossible to retrieve the user data from capacity increase is achieved at the cost of offsetting the the storage. Therefore, it is used when access performance is the durability and performance of the flash device. As summarized in only goal to achieve. 164
  3. 3. 2.3.2 RAID-4 2. Generate new parity information with new and To overcome the reliability challenge, RAID-4 makes use of a unmodified data extra disk, which is responsible for holding redundant information 3. Write new user data to storage necessary to recover user data when a disk fails [3]. RAID-4 4. Write new parity data to storage stripes data at the block level across several drives, with parity As indicated in the above sequence, in order to write new user stored on one drive. This approach provides high read data, RAID-5 needs to read the unmodified area in a stripe and performance as in RAID-0 due to parallel reads to multiple disks. generates a new parity. Finally, new user data and parity should However, write performance is poor due to the overhead caused be stored in the storage. Due to the overhead associated with by writing additional parity to the disk. writing additional parity, write performance can be penalized by One of drawbacks of RAID-4 is that some disks have too much over 50% compared with the case of having no redundancy. traffic because it should write parity for every write. (a) RAID Level 4 Figure 2. Data and code organization of SBSS Another approach is SBSS (Self-Balancing Striping Scheme) which is a new striping scheme for flash storage to reduce read (b) RAID Level 5 latency [2]. Because of the conflict between localities of reads Figure 1. RAID architectures and writes, the benefit of parallelism is severely limited. To solve this problem, a redundancy allocation policy and a request- scheduling algorithm are presented to realize the idea. The 2.3.3 RAID-5 redundancy allocation regularly disperses a piece of data over two While RAID-4 achieves a high read performance due to parallel other chips, as shown in Figure 2. The data is completely accesses to disks, its write performance suffers from excessive interleaved on three different banks for the possibility of load write accesses aggregated to a parity disk. To remedy this balancing. Thus, if some chips have a lot of pending flash problem, RAID-5 distributes user data and parity across disks as operations, a read request can be redirected to another chip that shown in Figure 1. For example, when the data in D1 and D4 are has associated parity data. As a result, read latency can be updated, Disk 0, 1, 3 and 4 may become busy in parallel or in an reduced. However, this approach needs to update two parities for overlapped fashion, and thus no single disk remains in a every write to user data. Therefore, write performance can be performance bottleneck. decreased by over 70% as compared with the case where no redundancy is used. Furthermore, the data area is as large as However, in flash storage, if RAID-5 is used in the same way as redundancy space, which reduces space efficiency. in hard disks, it results in poor write performance because even a small random write may incur expensive write and erase There has been an attempt to use RAID-4 to enhance fault operations. tolerance [12]. This mechanism uses NVRAM to amortize poor write performance caused by frequently updated parity. Most of parities are updated in NVRAM for the write requests, and then 3. RELATED WORKS the parity is flushed to the flash memory when it comes to Many approaches have aimed to improve the performance of flash perfection. As a result, it handles the problems of frequently storage when RAID-5 is used. One such effort is REM (Reliability updated parity of a specified device and numerous parity write Enhancing Mechanism) which uses RAID-5 with several MLC operations for the partial write request. However, it needs flash memories to increase the reliability of the storage [1]. additional NVRAM to solve the problems and it may have Because this approach uses a native RAID-5 scheme, it inherits reliability issues regarding NVRAM because the write cycle of all the strengths and weaknesses of RAID-5. In general, RAID-5 NVRAM is increased considerably by the frequently updated write requests are handled using the following sequence. parity. 1. Read the full stripe except the portion where the modification will take place. 165
  4. 4. 4. IDEA IN A NUTSHELL 5. FRA (Flash-aware Redundancy Array) As explained above, RAID-5 has the problem of updating parity Recent RAID-based approaches as discussed in Section 3 have data even for small random writes. Furthermore, such parity updates good characteristics such as high reliability and high read may increase the latency of write operations. The idea here is quite performance. However, they have traded write performance for straightforward. (1) Parity updates are performed after a write such benefits. The degradation of write performance is caused by operation finishes. If there is sufficient idle time between flash the overhead writing parity to flash for better reliability. Some of accesses, this period can be utilized to store parity to flash, and thus the techniques lose 50% of the write performance due to such to increase write performance. (2) Delayed parity updates can overheads. Therefore, the overhead minimization is the key accommodate frequent updates to hot data in the flash, which in turn requirement to improve write performance. contributes to reducing the write operations. In this case, it Latency of a write request is the flash access time for the data and contributes not only to performance, but also to reliability. The its parity data write time to the flash memory. It is expected that details of a delayed update scheme will be discussed in Section 5.2. write performance in FRA is almost the same as the case when no It has been observed that idle time between disk accesses is much redundancy scheme is adopted because the parity update is longer than disk access time in the PC environment. Figure 3 delayed and executed in a later idle period. illustrates (a) the distribution of idle time, and (c) and (d) its distribution when the disk activating a PC with windows XP is monitored using DiskMon[4]. Figure 3 (c) and (d) indicate that idle time is longer than 10ms in average. Therefore, we can utilize idle time to update parity for write performance improvement. However, we noticed that there is insufficient idle time in the benchmark tool or huge-sized file copy operation. 1200 1000 (a) Existing algorithm behavior Microseconds 800 600 400 200 0 Internet MSOffice ATTO MP3 File Sandra File Movie File Exploration Installation Copy System Copy (a) Idle time per sector (b) FRA behavior Figure 4. Idle time usage 5.1 Overall Architecture FRA is based on RAID-5 architecture. In RAID-5, user data is stripped onto 4 different disks as in Figure 1(b), and parity data is stored in other disks. Therefore, user data and parity data can be stored at the same time. SSDs have recently used multiple channels such as 4/8/16 (c) Idle time distribution of MS Office Installation channels, and flash memories share their I/O with other flash memories in the same channel. In FRA, logical address layout is organized as shown in Figure 5. To reduce latency, user data and parity data are located in different chips such as RAID-5 architectures. ... ... ... ... ... ... ... ... (d) Idle time distribution of Internet Exploration Figure 3. Idle and disk access time Figure 5. Overall organization of FRA 166
  5. 5. The FRA allocates 20% of the total blocks to the redundancy area. the overall erase count can be reduced, which contributes to In this case, four pages each from a device at the same offset form increasing the life time of flash storage. a stripe, and their parity is stored in one of the devices in a Table 2. Parity update time different way. If we are to support reliability only for a fraction of the data area, we can save the parity area, and thus have more Device Type Page Size Parity Update Time blocks for user data. SLC 4KB 420us 5.1.1 LP, LPG and Parity SLC 2KB 320us LPG (Logical Page Group) is a group that is composed of LPs (Logical Pages). The number of LPs in a LPG is the same as the MLC 4KB 450us number of channels to maximize the parallelism, and each LP is sequentially ordered. A parity page is generated for each LPG. Assume that the PG queue has a number of parity update entries If the flash memory has problems such as read errors, we can caused by the write requests. The FRA takes out a parity update recover the original user data using its parity because it is in entry from the PG queue, generates new parity data by reading another chip. That is why every LPG and its parity are located in user data of the LPG from the flash memory, and writes it into the different flash memories. Figure 6 describes the overall layout of flash memory. Thus, the parity update takes time in reading from LPs and LPGs with eight flash memories and 4 channels. the data page of 4 channels and programs new parity data into the flash memory. Table 2 shows the time required to generate parity data using each type of flash memory. (a) Parity delete example Figure 6. LP to LPG mapping 5.2 Delayed Write Scheme FRA performs three different operations for a write request from a host. First, user data is written into a log block in the flash memory. Second, previously generated parity data associated with the LPG becomes invalid because of the newly arrival user data. Therefore, the FRA deletes the mapping entry of the log block as in Figure 7(a). Finally, the FRA inserts a job of parity update information into the PG (Parity Generation) queue to generate new parity data later during idle time. If there appears to be an idle period, the FRA pops the first entry of the PG queue, generates new parity data and writes it into the flash memory. (b) Delayed write example This finishes a delayed write. Figure 7. (a) Delete and write operation example By using the delayed write scheme, the FRA can reduce parity write frequency for multiple write requests to the same area. In From Figure 3, we can see that we can write 40 parity data in SLC other words, it has the advantage of reducing parity update count, with 4KB per page, because the arithmetic mean of idle time per which reveals a similar effect of using NVRAM in Building sector is 2,100us. It is sufficient to write parity to flash memory Reliable NAND Flash Memory Storage Systems [12]. Furthermore, during idle time. 167
  6. 6. 5.3 Dual Page Mapping Table in Log Block and at that time the original PPN of LPN(21) was 33. Parity data Delayed write scheme has two problems. First, the PG queue can of the LPN(21) was generated and written into the flash memory become full if there is no idle period between successive write during the idle period, so the PPN and original PPN became 36. operations. Second, the FRA cannot recover original user data for After that, there was a further write request to LPN(21). Therefore, a read error if old parity data has been deleted. its mapping table changed into a PPN(37). To overcome these problems, we suggest a dual page mapping We need to choose one log-block among log-blocks to generate table in the log block. Each mapping table of the log block has a new parity information during an idle period. In general, LRU logical page number (LPN) to physical page number (PPN), and (Least Recently Used) is an efficient mechanism in the cache another physical page number (Original PPN) which is the algorithm [13] or log-block management [14]. This is because the address for parity data for old data of the LPN. By using a dual oldest page may not update anymore. Therefore, FRA chooses a page mapping table in the log block, the PG queue does not least recently used log block, generates new parity data whose require any more because the log block mapping table can parity data was old one, writes the new parity data in the flash represent the newly written logical address of the LP. During an memory and updates the dual page mapping table. idle period, we can generate new parity data by looking up the log block mapping table. Furthermore, the FRA sets a limitation for 5.4 Power-off Recovery the number of newly updated pages in a log block. This means the Flash memory is used in various mobile devices, so a FTL should maximum queue depth is the same as the maximum valid page of consider sudden power-failure. There are two processes in the log blocks. power-off recovery algorithm. First, a FTL should find the latest meta-data and load it into RAM. Then, it should recover the During idle time, if the PPN is not equal to the original PPN when mapping table of log blocks by reading the spare area, which the FRA looks up a log block mapping table, this means it has not contains the logical address information. completed the generation of a new parity yet. Thus, it needs to generate a new parity. The FRA uses the dual page mapping table in the log block, but its overall design compared to the log-block based FTL is not If FTL needs to perform a merge operation to prepare a free page changed, so it can use the log block based power-off recovery for a write request, it chooses a victim block among log blocks scheme as in PORCE (Power Off Recovery sChEme) [8]. and copies a valid page from the log block and data block to a However, it should identify the PPN and original PPN while free block. In this merge operation, the FRA generates valid new constructing the log block mapping table. parity data but the PPNs need to be updated with the original PPN page in the log block. Assume that a 32GB SSD is using SLC flash memory whose page 6. PERFORMANCE EVALUATION size is 4KB. If we allocate ten log blocks in a chip, a single page mapping table needs 10KB RAM space, and the FRA needs 6.1 Experiment Setup We implemented a trace-driven simulator to evaluate performance. 20KB RAM space to allocate a dual page mapping table for log The simulator emulates a number of chips with multi-channels. blocks. The dual page mapping scheme increases its mapping We assumed that the large block SLC having 2KB per page table size to twice as much as single page mapping table scheme, should be used for the simulation and the number of pages in a but it can solve two problems of the delayed write scheme. block was 64. BAST was assumed in FTL by default. Original Original Table 3. Workload descriptions LPN PPN PPN LPN PPN PPN 8 40 64 20 0xFFFF 0xFFFF 9 0xFFFF 0xFFFF 21 37 36 Benchmark Description 10 41 66 22 0xFFFF 0xFFFF Sequential, random and update 11 0xFFFF 0xFFFF 23 0xFFFF 0xFFFF Sandra File System pattern Sequential write and read for each (1) Write 21 LBN(2), PBN(10) LBN(5), PBN(9) ATTO (2) Idle size of 0.5KB to 1MB 8 21 (3) Write 21 (4) Write 8 10 21 MP3 File Copy Copy mp3 files (5) Write 10 Log blocks Movie File Copy Copy a movie file 8 20 9 21 MS Office Installation MS Office 2003 installation Data blocks 10 22 11 23 Internet Exploration Small size read and write pattern PBN(16) PBN(8) Figure 8. Dual mapping table example For the simulation, we gathered various workloads using DiskMon in MS Windows XP environment, such as Sandra File Figure 8 describes a dual page mapping table operation in log System, ATTO, MP3 File Copy, Movie File Copy, MS Office blocks. A write request arrived at the LPN(21) and parity data Installation and Internet Exploration. We also used separated was generated for the LPN(21) in idle time, and then the next secondary storage to eliminate the effects of the operating write requests to LPN(21), LPN(8), LPN(10) arrived. For the first system’s internal operations. Table 3 describes the workloads for write request, the data of LPN(21) was written into the PPN(36), the simulation. 168
  7. 7. To evaluate write performance when idle time is insufficient, we The performance of the BAST is the best among others because it used the Sandra File System, ATTO, MP3 File Copy and Movie does not support a redundancy mechanism from the read failure. File Copy workloads. To evaluate write performance in random However, FRA is better than RAID-5 for all workloads including write dominant workloads, we used the Sandra File System and ATTO, MP3 File Copy, Sandra File System and Movie File Copy, Internet Exploration workloads which have very random write which have less idle time. The difference in performance between propensity. Figure 9 illustrates the disk access characteristics of the FRA and RAID-5 was related to the number of parity updates the Sandra File System and Internet Exploration workloads. and amount of idle time. In RAID-5, every write operation to the log block has its own parity write operation. However, FRA can The FRA implements the dual page mapping table in the log write its parity data into the flash memory in idle time or during a block based on BAST, and allocates 20% of the total space to the merge operation, which means if workloads have a special redundancy area to guarantee reliability for all user data. To localization, then it can reduce parity write frequency even more. compare write performance of the FRA with other algorithms, we Figure 10 (b) shows the performance of BAST, FRA and RAID-5, developed a RAID-5 algorithm for REM. RAID-5 has a buffering normalized to BAST. The average performance of RAID-5 was scheme to reduce parity write overhead in sequential workloads, about a 56% degree of the BAST and FRA was about a 69% and it is also based on BAST. SBSS was not considered because degree of the BAST. FRA achieved 74% of BAST for the Internet its write performance is lower than RAID-5, and in this approach, Exploration and Office Installation workloads, which have each user data is associated with two parity data. enough idle time to update parity data. In this experiment, we used the same FTL algorithms to observe Figure 11 (a) describes the proportion of invalid pages of user how the write performance is affected by different redundancy data in log blocks for each workload. Most workloads have algorithms, and all experiments were performed with the same invalid pages in log blocks caused by overwriting in the same number of flash memories and channels. The FTL code execution location. In particular, the ATTO workload has 10% overwrites time and FTL meta-data management overhead were not because the ATTO benchmark software writes user data with considered because we were focusing on the redundancy policy. small sector size (512 bytes to 1MB) and a mis-aligned sector number. In the case of those characteristics, FRA can reduce the number of parity write requests and the frequent parity overwrites for each workload using delayed write scheme like Figure 11 (b) and Figure 11 (c). In other words, FRA can reduce the number of parity writes in mis-aligned or special locality workloads. Therefore, ATTO, which has a high proportion of overwriting parity data, shows a greater performance improvement than MP3 File Copy, Sandra File System and Movie File Copy. Furthermore, FRA has higher performance improvement for the MS Office Installation and Internet Exploration in which idle time is sufficient. This means that idle time is an important factor for improving performance. (a) Internet Exploration Workload Table 4. Normalized performance improvement Benchmark RAID-5 FRA Internet Exploration 1 1.55 MS Office Installation 1 1.41 ATTO 1 1.17 MP3 File Copy 1 1.11 Sandra File System 1 1.14 Movie File Copy 1 1.11 (b) Sandra File System Workload Table 4 describes the performance improvement rate of FRA Figure 9. Workload characteristics compared to RAID-5. FRA increases performance by about a 20% degree compared to RAID-5 for all of workloads. We could also see a lower performance improvement for the MP3 File Copy 6.2 Experimental Results and Movie File Copy, which are sequential workloads. In other words, the delayed write effect of FRA is reduced in sequential 6.2.1 Performance Comparison workloads. Figure 10 (a) shows the total execution time of BAST, RAID-5 and FRA for the workloads in Table 3. The y-axis is the total operation time for each workload, where the unit is second. 169
  8. 8. 500 1 450 0.9 400 0.8 350 0.7 300 0.6 Seconds 250 BAST 0.5 BAST 200 RAID5 0.4 RAID5 150 0.3 FRA FRA 100 0.2 50 0.1 0 0 Internet MSOffice ATTO MP3 File Sandra File Movie File Internet MSOffice ATTO MP3 File Sandra File Movie File Exploration Installation Copy System Copy Exploration Installation Copy System Copy (a) Total execution time (b) Execution time comparison with BAST Figure 10. Performance result 12 40 35 10 30 8 Percentage(%) Percentage(%) 25 6 BAST 20 BAST RAID5 15 RAID5 4 FRA FRA 10 2 5 0 0 Internet MSOffice ATTO MP3 File Sandra File Movie File Internet MSOffice ATTO MP3 File Sandra File Movie File Exploration Installation Copy System Copy ExplorationInstallation Copy System Copy (a) The proportion of invalid pages in the log block for user data (b) The proportion of invalid pages in the log block for parity data 300000 1000000 900000 250000 800000 700000 200000 600000 150000 BAST 500000 BAST RAID5 400000 RAID5 100000 300000 FRA FRA 50000 200000 100000 0 0 Internet MSOffice ATTO MP3 File Sandra File Movie File Internet MSOffice ATTO MP3 File Sandra File Movie File Exploration Installation Copy System Copy Exploration Installation Copy System Copy (c) The total number of parity writes (d) The total number of reads required to generate new parity Figure 11. The result related with parity generation 120000 operations until its log block performs a merge operation. 100000 Therefore, the number of erase count can be reduced by lowering parity writing frequency. In particular, FRA does not only 80000 increase write performance but also reduces the erase count as in 60000 BAST Figure 12 in the ATTO workload, which has many overwrites in RAID5 40000 user and parity data as in Figure 11 (a) and (b). FRA 20000 0 6.2.2 Merge Comparison Internet MSOffice ATTO MP3 File Sandra File Movie File Figure 12 (a) describes the number of merge counts and the type Exploration Installation Copy System Copy of merge operations. There are three types of merge operations: swap merge, partial merge and full merge. Since the swap merge Figure 12. Erase count comparison has the smallest merge cost among them, the storage system that has the highest number of swap merge operations leads to good 6.2.1 Erase Count Comparison write performance. Figure 12 describes the erase count comparison of blocks in the The FRA achieves a decreased number of full merge counts by flash memory for each workload. FRA delays parity write reducing parity write frequency and by using a delayed write 170
  9. 9. approach. We can notice that the overall merge count is decreased, 9. REFERENCES while the swap merge proportion is increased. However, its log [1] Soraya Zertal, “A Reliability Enhancing Mechanism for a block utilization is degraded due to reduced parity write Large Flash Embedded Satellite Storage System”, Systems, frequency. Figure 12 (b) describes the log block utilization. 2008 ICONS 08 Third International Conference on, 2008. 14000 [2] Yu-Bin Chang, Li-Pin Chang, “A Self-Balancing Striping 12000 Scheme for NAND-Flash Storage Systems”, Proceedings of the 2008 ACM symposium on Applied computing, 2008. 10000 [3] DA Patterson, G Gibson, and RH Katz, “A Case for Redundant Arrays of Inexpensive Disks(RAID)”, ACM 8000 SIGMOD Record, 1988. 6000 [4] Disk Monitor for Windows v2.01, http://technet.microsoft.com/en- 4000 us/sysinternals/bb896646.aspx. 2000 [5] S.W. Lee, D.J. Park, T.S. Park, D.H. Lee, S.W. Park, and H.J. Song, “A Log Buffer-Based Flash Translation Layer Using 0 Fully-Associative Sector Translation”, ACM Transactions on Embedded Computing Systems, vol. 6, no. 3, 2007. T T T T T T FRA FRA FRA FRA FRA FRA RAID5 RAID5 RAID5 RAID5 RAID5 RAID5 BAS BAS BAS BAS BAS BAS Internet MSOffice ATTO MP3 File Copy Sandra File Movie File Copy [6] C.I. Park, W.M. Cheon, Y.S. Lee, M.S. Jung, W.H. Cho, and Exploration Installation System H.B. Yoon, “A Re-configurable FTL(Flash Translation Swap Merge Partial Merge Full Merge Layer) Architecture for NAND Flash based Applications”, 18th IEEE/IFIP International Workshop on Rapid System (a) Number of merge counts and merge type Prototyping, IEEE, 2007. 1 [7] J.H. Kim, S. Jung, and Y.H. Song, “Cost and Performance 0.9 Analysis of NAND Mapping Algorithms in Shared-bus 0.8 Multi-chip Configuration”, IWSSPS’08, 2008 0.7 0.6 [8] Tae-Sun Chung, Myungho Lee, Yeonseung Ryu, and 0.5 BAST Kangsun Lee, “PORCE: An efficient power off recovery 0.4 RAID5 scheme for flash memory”, Journal of Systems Architecture, 0.3 FRA vol. 54, no. 10, 2008. 0.2 [9] Jesung Kim, JongMin Kim, Sam H. Noh, Sang Lyul Min, 0.1 and Yookun Cho, “A Space-Efficient Flash Translation 0 Layer for CompactFlash Systems”, IEEE Transactions on Internet MSOffice ATTO MP3 File Sandra File Movie File Consumer Electronics, vol. 48, no. 2, 2002. Exploration Installation Copy System Copy [10] Jeong-Uk Kang, Heeseung Jo, Jin-Soo Kim, and Joonwon (b) Log block utilization Lee, “A Superblock-based Flash Translation Layer for NAND Flash Memory”, EMSOFT’06, ACM, 2006. Figure 13. Merge and log block utilization [11] Dawoon Jung, Yoon-Hee Chae, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee, “A Group-Based Wear-Leveling 7. ACKNOWLEDGEMENTS Algorithm for Large-Capacity Flash Memory Storage This work was developed within the scope of Human Resource Systems”, CASES’07, ACM, 2007. Development Project for IT SoC Architecture. & This research [12] Kevin M. Greenan, Ethan L. Miller, and Darrell D.E. Long, paper has been supported by 「Nano IP/SoC Promotion Group」 “Building Reliable NAND Flash Memory Storage Systems”, of 「Seoul R&BD Program」 in 2009. International Workshop on Large-Scale NVRAM Technology, 2008. 8. CONCLUSIONS [13] H Jo, J Kang, S Park, J Kim, and J Lee, “FAB: Flash-Aware The RAID-5 technique is a well-known approach for increasing Buffer Management Policy for Portable Media Players”, the reliability of traditional disks. However, as we have IEEE Transactions on Consumer Electronics, 2006. mentioned above, the existing RAID-5 scheme for flash-based [14] C Park, P Talawar, D Won, MJ Jung, JB Im, S Kim, and Y storage has a delayed response time for parity updating. To Choi, “A High Performance Controller for NAND Flash- overcome this problem, we have proposed a flash-aware based Solid State Disk (NSSD)”, IEEE Non-Volatile redundancy array technique called FRA. In FRA, parity updates Semiconductor Memory Workshop(NVSMW), 2006. are postponed so that they are not included in the critical path of read/write operations. Instead, they are scheduled for when the [15] S. Jung, J. Kim, and Y. Song, “Hierarchical Architecture of device becomes idle. Simulation results have shown that the Flash-based Storage Systems for High Performance and proposed scheme improved the RAID-based flash storage by up Durability”, Proceedings of the 46th annual Design to 19% as compared with other approaches. Automation Conference (DAC), 2009 171

×