Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. EVALUATION OF GARBAGE COLLECTION TECHNIQUES ON FLASH TRANSLATION LAYER Jae Geuk Kim, Min Choi, Seung Ryoul Maeng Division of Computer Science, Department of Electrical Engineering and Computer Science, KAIST, 373-1 Guseong-dong Yuseong-gu, Daejeon, 305-70 South Korea {jgkim, mchoi, maeng}@camars.kaist.ac.kr ABSTRACT Generally, there are two kinds of flash memories, Flash memory technology is becoming more popular in NAND and NOR flash memory. NOR flash memory designing and building embedded systems applications supports I/O operations in unit of byte and fast read because of its shock-resistent, power economic, and non- operations, but due to slow write operation it is used as volatile nature. Because flash memory is a write-once and memory for the code area. So we can use NOR flash bulk-erase medium, the garbage collection mechanism on memory as a substitute for main memory of computer Flash Translation Layer is needed to provide applications system with a byte unit access except that it has slow a transparent and high bandwidth storage service. In this write operations. A bios is an exemplar of use of NOR paper, we propose and implement a FAT-aware log-based flash memory. On the other hand, NAND flash memory Flash Translation Layer, which has two points of garbage supports I/O operations in unit of page or sector. NAND collection time. We also propose two versions of victim has slower read operations than NOR, however its write selection policy which is to select a log block and cycle time is relatively shorter than NOR. NAND is used invalidate it. The performance between the proposed for storage system which has large capacity and expected victim selection policies is evaluated in terms of the to substitute a hard-disk. In these days, NAND becomes effectiveness and the overhead by a series of experiments more popular in the flash memory markets. over our implemented system. The large block NAND flash memory has many advantages as we mentioned before, but it also has two major limitations to implement as the storage system: the KEY WORDS nature of flash memory in (1) write-once and (2) bulk- Embedded Systems, Flash Memory, Flash Translation erasing issue. Because flash memory is write-once, the Layer, Garbage Collection written data can not be overwritten directly. Instead, the new data should be written another available space elsewhere. The old data is then invalidated and marked as 1. Introduction it can be erased. As a result, physical locations of data change as time goes, because of the adoption of the out- In these modern days, flash memory is the most place-update scheme. A bulk-erasing could be initiated popular storage device for embedded systems and mobile when flash memory storage systems have a large number devices as notebook computer or PDA. This trend is of valid and invalid data mixed together. expected to expand more rapidly. Furthermore capacity of A bulk-erasing could involve a significant number of flash memory will increase exponentially. These are valid data copies since the valid data on the erased region attributed to the advantages of flash memory as follows. must be copied to somewhere else before the erasing. First, flash memory is non-volatile. It is the most That is so called garbage collection to recycle the space attractive feature of flash memory for mobile devices occupied by invalid data. which are dependent on the power capacity. Second, flash memory has lighter weight and smaller size than other In the past work, various techniques were proposed to storage media such as a hard-disk. Moreover, flash improve the performance of garbage collection for flash memory is more durable to physical shock than a hard- memory, e.g., [1, 2, 3]. In particular, Kawaguchi, et al. disk. Third, it has no mechanical delay which is the proposed the cost-benefit policy [1], which uses a value- critical drawback of a hard-disk. Flash memory can be driven heuristic function as a block-recycling policy. accessed in relatively short time and has superior Chiang, et al. [3] refined the above work by considering performance for read and write operations in random the locality in the run-time access patterns. Kwoun, et al. patterns. [2] proposed to periodically move live data among blocks so that blocks have more even life-times. Although 523-023 1
  2. 2. researchers have proposed excellent garbage-collection Block mapping(128K) policies, there is little work done in providing a Physical blocks deterministic performance guarantee for flash-memory FAT mapping area storage systems. [4] showed that garbage collection could 0 1 impose almost 40 seconds of blocking time on time- critical tasks without proper management. [5] proposed FAT log blocks the predictable garbage collection mechanism to provide deterministic garbage collection performance in time- critical systems. 2 3 The rest of this paper is organized as follows: Section 2 DATA mapping area 4 5 introduces the FAT-aware log-based Flash Translation 6 7 Layer which we implemented. Section 3 presents our garbage collection technique including victim selection policies. The proposed policies are evaluated in Sector 4. DATA log blocks Section 5 is the conclusion and the future research. Figure 1: Overall design of FAT-aware log-based FTL. FAT 2. FAT-aware log-based FTL area uses page mapping with 512 bytes blocks, while DATA area uses block mapping with 4K bytes blocks. 2.1 Mappings and FAT file system This paragraph describes the characteristics of page adopted that page mapping is performed by 512 bytes and and block mappings which are basic implementation block mapping is performed by 4K bytes. The reason that methods in FTL, and features of FAT file system. Page a unit size of block mapping is 4K bytes is to maximize mapping is profitable for lots of random write operations the utilization of the OOB area for the offset processing. which make many small fragmentations, because any Details about it are presented in 2.2.2 later. page can be mapped anywhere in the flash memory. This mapping scheme might be considerably noble if it would Figure 1 shows the structure of page mapping and not need lots of memory in RAM. To reduce this kind of block mapping scheme which we implemented in FTL. memory limitation, on the other hand, block mapping For mappings between logical and physical blocks, we scheme was proposed. It is profitable for sequential write basically adopted the block mapping which is consists of operations which access by large size of blocks at once. some erasure blocks of 128K bytes. In 2.2.1, we describe Also it does not need lots of memory to maintain the table. in details that page mapping also uses block mapping But relatively it is more complicate to manipulate the indirectly. As Figure 1, the whole mappings of blocks are indexes compared to page mapping scheme. consisted of two large areas, FAT by page mapping and The other point which we concentrated before is the DATA by block mapping only. characteristics of FAT file system, because most of embedded systems like mp3 players use the FAT file 2.2.1 Mapping for FAT area system for users to make convenient. Basically FAT file FAT area adopts page mapping scheme which is system has two areas. One is FAT area which has many operated by 512 bytes and each page is mapped to active random accesses in a small block size. Another is specific logical block in whole block mapping table as DATA area which has a tendency to do lots of sequential shown in Figure 2. In case of read operations, logical write operations. block number can be derived from page mapping easily. With this logical block number and the offset, an index of 2.2 Overall Design physical block which contains real data is obtained. In With the characteristics of two mapping approaches case of write operations, a variable named log pos is and FAT file system, it is ideal to use FAT area with page needed. This variable in FAT area means the last position mapping and DATA area with block mapping. But there of the log data written. log pos is an index of 512 bytes exist two issues which we should consider. and used for computing an offset of the block which is First issue is that page mapping is available just for a written newly. It increases by one after writing a 512 few blocks because it has a limitation of size for mapping. bytes block. In contrast, block mapping can map larger size of blocks. The size of this area can be computed from the boot But, this issue is not considerable, because FAT area is sector which is requested when this area is formatted into small in whole FAT file system. FAT file system. However, actual format process Second issue is how to determine the sizes of blocks consumes whole erasure blocks in the FAT area so that which are for page mapping and block mapping. continuous cleaning processes are needed when the file Especially for block mapping, if the length of offset is too system is mounted. In order to solve the problem, we long then size of offset table which is originally used in a allocate additional log blocks in page mapping area. unit of 512 bytes is so burden. So in our FTL design, we Because the number of log blocks is flexible, compaction 2
  3. 3. time of the logical blocks by the garbage collection can be of garbage collection technique spend. Garbage collection reduced if it becomes larger. technique consists of valid-copies and block-erasure. FAT page mapping(512) Valid-copy is the merging a logical block which is scattered in whole area of physical blocks into an erasure Physical blocks block. So the time that one valid-copy spends is proportional to the number of valid blocks which should 0 1 Log pos copy to combine. For example, if flash memory is a large block NAND flash memory and we use the 4K bytes blocks, there are 32 blocks (64 flash pages) in an erasure block because the size of an erasure block is 128K bytes. 2 3 If we face the worst case which all of scattered blocks are 4 5 valid, one valid-copy time is about 13 ms by (127 + 281)×32 , while one block-erasure time is only 2 DATA mapping area 6 7 ms. In this paper, we aim to eliminate the valid-copy overhead by doing it at free time period. We implement DATA log blocks FAT-aware log-based Flash Translation Layer which Log pos adopts it. Figure 2: Detailed mapping scheme of FAT and DATA area. The rest of this section is organized as follows: 3.1 FAT area adopts page mapping, while DATA area adopts block introduces some assumptions to specify the environment. mapping. 3.2 introduces our fundamental garbage collection policy. 3.3 shows some implementation issues. 3.4 presents our 2.2.2 Mapping for DATA area victim selection policies, and in the next section, we DATA area adopts block mapping scheme which is evaluate the policies over our implemented system. operated by 4K bytes blocks as shown in Figure 2. Unlike FAT area, it uses one to one mappings basically and can 3.1 Assumptions use maximum 7 log blocks. The limitation of 7 log blocks Before making an efficient victim selection policy, we is due to restriction of the size of OOB which contains have to make some assumptions as follows: (1) In these offsets. In case of read operations, firstly it reads the days, it is general that a mp3 player is based on the FAT offset table which contains the position of requested block. file system. So we are focusing that the underlying file The position of a block which contains the offset table is system of our system is the FAT file system. (2) We are calculated with indicated offset in the block mapping also focusing the behavior of users who use a mp3 player. structure. Because all of logical blocks in DATA area has Users may write some mp3 files at once, and then read or at least one physical block directly, if the offset of the write some directories or even do nothing. We define the requested block is between 0 and 31, the real data is in wait-time is such an intermediate time which copies that physical block. Otherwise, the real data is in one of 7 nothing and we make an assumption that there exists log blocks. The number of 32 is derived from the 128K some wait-time between user behaviors. bytes of erasure block size divided by 4K bytes of logical block size. 3.2 Garbage Collection Policy We already make an assumption that there exists the 3. Garbage Collection Technique wait-time. In the FAT-aware log-based FTL, the most important thing of the garbage collection is how fast to A typical flash memory chip has three operations: Page provide more log blocks when there is no enough time to read, page write, and block erase. The performance of the do it. To succeed it, we add a thread in our FTL which do three operations supported by the NAND Flash Simulator some garbage collection operations. As we present at the of Memory Technology Device (MTD) is listed in Table front of this section, there are two garbage collection 1. operations which are valid-copies, and block-erasure. Because valid-copies are more time-spending operations than block-erasure, our thread does it whenever there are Page Read Page Write Block Erase 2048 bytes 2048 bytes 128K bytes some wait-times. However there might be not enough wait-time to do Performance( µs ) 127 281 2,000 whole valid-copies. At that case, if we select a victim Table 1: Performance of the NAND Flash Memory. This block inappropriately, there occurs a problem that we table shows performance of page read, write and block erase make unnecessary valid-copies. So the victim selection operations. These data is from the basic MTD definitions. policy is the most important issue of our garbage collection technique. According to Table 1, we can know that reading or writing time of a page is much smaller than block-erasing time. We can also calculate time which major operations 3
  4. 4. 3.3 Implementation find an index t ∈ {0L k − 1} of log block that We now present how to know whether the file system minimizes the Diff (t ) i.e. or FTL is idle or not. The key data structure is a request queue. In Linux, most of file systems use request queues which provide the file system to access a determined t = arg min{Diff (t ′)} t′ block device. If there is a read or write request in the queue, it implies that it is not the idle time. As mentioned before, we implement additional The good characteristic of SDBF is supporting a fast garbage collection thread. The thread acts the valid-copies response time to treat read, write request operations. The whenever the request queue of the flash block device has small Diff (t ) means that there are less valid-copies in a nothing. If there is enough time, the thread makes all of log block. However there might make the overhead when log blocks be invalidated which means all of log blocks there are lots of fragmentations which will be invalidated can be erased and reused. soon. The overhead means that a thread does unnecessary Because of using the thread, we need to consider the valid-copies. synchronization issue between garbage collections in the thread and in the FTL. By using FTL-wide locks, 3.4.2 Large Different Block First (LDBF) specially spinlocks, we solve the synchronization issue The opposite side of SDBF, LDBF selects an index t simply. of log block that maximizes the Diff (t ) i.e. 3.4 Victim Selection Policies In our implemented FTL, there are two versions of t = arg max{Diff (t ′)} t′ garbage collection as follows. (1) The implicit garbage collection is implemented in a write function. When there LDBF has the advantage of less overhead which is no space to write some data, the FTL needs to make means as mentioned in SDBF, when there are lots of spaces by valid-copies and block-erasure. Because the fragmentations which will be invalidated soon. Because implicit garbage collection is inevitable, it should have LDBF selects maximum Diff (t ) , the thread spends lots minimum time to finish the request transaction. (2) The explicit garbage collection is implemented in a thread. of time to free one log block by doing valid-copies. Whenever there is the wait-time, it does as many valid- copies as possible. In this paper, we adopt the SDBF victim selection Before representing the victim policies which we policy on the implicit garbage collection because of fast propose, we make some definitions. request response time. But on the explicit garbage collection, there might show the different performances between SDBF and LDBF victim selection policies. In the 1 , if there exists a valid block of i'th logical block in t'th log block next section, we evaluate the performance between SDBF I (t , i ) =  0 , otherwise and LDBF by applying different environments like wait- times. In addition, we analyze the results and propose the efficient victim selection policy suitable to the given As we consider the I (t , i ) for all of log blocks, we environment. define the Different(Diff) blocks as follows. 4. Performance Evaluation Diff (t ) = ∑ I (t , i ) i ∈ all of logical blocks In this section, we evaluate the performance of victim selection policies on the FAT-aware log-base Flash The t’th log block has valid fragmentations of Translation Layer. We implemented the FTL on Memory Diff (t ) logical blocks. In other words, it means that the Technology Device(MTD) based on the linux 2.6.13 kernel, which has 7 log blocks for the DATA area and 2 amount of Diff (t ) operations needs to invalidate and log blocks for the FAT area. Virtual flash memory is erase the t’th log block. The rest of this section, we created by MTD nand Flash simulator. First we compare propose two victim selection policies by using the behaviors of a system with or without the explicit the Diff (t ) . garbage collection. And then we also compare the performance with the explicit garbage collection applied 3.4.1 Small Different Block First (SDBF) SDBF or LDBF policies. In another aspect, we show the When it is a time to do the garbage collection, the overhead of each proposed cases. thread checks Diff (t ) for all of log blocks to select one 4.1 Simulation Workloads and do valid-copies. Given the number of log blocks k, A series of experiments were done over a real system. A set of tasks was created and executed to age the flash 4
  5. 5. memory of our system. The basic workload consisted of will be invalidated soon. This overhead is presented from 20 mp3 files of about 4M bytes. First we made a virtual 0s to 0.3s of wait-time in Figure 3. When the wait-time is flash memory on MTD, which is a large block NAND more than 0.3s, the graph shows that the overhead of flash memory. Then we made a FAT file system on it and SDBF is eliminated by providing more log blocks. As the mounted it with a synchronization option to wait-time is enough large, all of log blocks can be predetermined position. Because if the file system is invalidated. mounted with no 5. 000 3000 LDBF 2900 LDBF SDBF asur count 4. 500 2800 SDBF avg. copy tm e(s) No G C No G C 2700 2600 e 4. 000 i 2500 t aler 2400 3. 500 2300 ot 2200 3. 000 2100 2000 2. 500 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 w ai tm e (s) ti w ai tm e (s) ti Figure 4: The overhead of proposed victim selection policies. Figure 3: Effectiveness of the explicit garbage collection. This The overhead is substituted to block erasure count. More block graph shows the comparison between proposed victim selection erasure means that it copies more unnecessary blocks. policies. 4.3 The Overhead of Proposed Policies synchronization option, read and write requests from the In order to represent the overhead of our proposed file system to FTL are unpredictable by reason of buffer victim selection policies, we measured the overall erasure cache effects. After aging flash memory sufficiently, we count for all erasure blocks. As showed in Figure 4, it has copied mp3 files with the wait-time between file copies. a similar inclination to Figure 3. It means that any method has the overhead of erasure count by doing unnecessary The rest of this section, we show the results in two valid-copies. If there is no explicit garbage collection, the points of view which are the effectiveness and the starvation problem still happens and works as the overhead of our proposed policies. overhead. The reason that the erasure count goes less overhead in SDBF or LDBF is how many recycling log 4.2 The Evaluations of Victim Selection Policies blocks there are. In other words, if there is more time to In order to demonstrate the effectiveness of our merge, both of SDBF and LDBF can acquire more log proposed victim selection policies, we measured the blocks to be reused. If there is enough time to do the average time to copy the files. As we mentioned before, explicit garbage collection, SDBF provides more log the implicit garbage collection adopts SDBF victim blocks which is invalidated than LDBF does on condition selection policy while the explicit garbage collection of the same time. adopts SDBF or LDBF. According to Figure 3, in terms of overall performance, active explicit garbage collection The results showed in Figure 3 and Figure 4 represent shows better performance than inactive one. Because the that the effectiveness of using the explicit garbage implicit garbage collection only has a problem which uses collection. Moreover if there is not enough time to do small number of log blocks repeatedly when it is a time to valid-copies, LDBF is more efficient than SDBF because need more space. The basic reason that this starvation of the overhead which is doing unnecessary valid-copies. problem occurs is the implicit garbage collection adopts When there is enough time, the performance goes into the SDBF. In other words, the implicit garbage collection reverse. selects and erases one log block which is selected by SDBF, and fills the log block. When the log block is 5. Conclusion filled up and Diff (t ) is small enough, it selects again by SDBF. In conclusion the starvation problem makes our This paper is motivated by the needs of a garbage system have one log block instead of 7 log blocks. collection technique which provide more efficiency in our implemented FAT-aware log-based Flash Translation When there is not enough time to clean the log block Layer. We implemented the FTL which is optimized in the explicit garbage collection, SDBF and LDBF has specially on FAT file system. And we propose two trade-offs. As mentioned before, SDBF might have an versions of victim selection policy on the implicit and overhead when it selects a log block unnecessarily which explicit garbage collections in our FTL. We demonstrate 5
  6. 6. the performance of our system in terms of the efficiency and the overhead. In this paper we show the relationship along the various wait-times between victim selection policies which we proposed. There are other techniques not addressed in this paper to optimize our system: We can analyze the input patterns and propose more efficient victim selection policy like a dynamic one which acts as the environment changes. In the dynamic case, we need to predict precisely the changes of the environment or turning point between policies. References [1] A. Kawaguchi, S. Nishioka, and H. Motoda,“A Flash Memory based File System,” Proceedings of the USENIX Technical Conference, 1995. [2] K. Han-Joon, and L. Sang-goo, “A New Flash Memory Management for Flash Storage System,” Proceedings of the Computer Software and Applications Conference, 1999. [3] M. L. Chiang, C. H. Paul, and R. C. Chang, “Manage flash memory in personal communicate devices,” Proceedings of IEEE International Symposium on Consumer Electronics, 1997. [4] Vipin Malik, “JFFS2 is Broken,” Mailing List of Memory Technology Device (MTD) Subsystem for Linux, June 28th 2001. [5] Li-Pin Chang, Tei-Wei Kuo “A Real-Time Garbage Collection Mechanism for Flash-Memory Storage Systems in Embedded Systems”, The 8th International Conference on Real-Time Computing Systems and Applications, 2002. [6] Journaling Flash File System, http://sources. redhat.com/jffs2/jffs2-html/ [7] “K9F2808U0B 16Mb*8 NAND Flash Memory Data Sheet,” Samsung Electronics Company. 6