Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Real-Time Garbage Collection Mechanism for Flash-Memory ...


Published on

  • Be the first to comment

  • Be the first to like this

A Real-Time Garbage Collection Mechanism for Flash-Memory ...

  1. 1. A Real-Time Garbage Collection Mechanism for Flash-Memory Storage Systems in Embedded Systems Li-Pin Chang and Tei-Wei Kuo {d6526009,ktw} Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, ROC 106 Abstract the out-place-update scheme. A bulk erasing could be Flash memory technology is becoming critical in initiated when flash-memory storage systems have a building embedded systems applications because of its large number of live and dead data mixed together. A shock-resistant, power economic, and non-volatile na- bulk erasing could involve a significant number of live ture. Because flash memory is a write-once and bulk- data copyings since the live data on the erased region erase medium, a translation layer and a garbage col- must be copied to somewhere else before the erasing. lection mechanism is needed to provide applications a That is so called garbage collection to recycle the space transparent storage service. In this paper, we propose occupied by dead data. However, each erasable unit a real-time garbage collection mechanism, which pro- of flash memory has a limited cycle count in erasing, vides a deterministic performance, for hard real-time which imposes another restrictions on flash memory systems. A wear-levelling method, which serves as a management strategies (the endurance issue). non-real-time service, is presented to resolve the en- durance issue of flash memory. The capability of the In the past work, various techniques were proposed proposed mechanism is demonstrated by a series of ex- to improve the performance of garbage collection for periments over our system prototype. flash memory, e.g., [1, 7, 8]. In particular, Kawaguchi, et al. proposed the cost-benefit policy [1], which uses 1 Introduction a value-driven heuristic function as a block-recycling Flash memory is not only shock-resistant and power policy. Chiang, et al. [8] refined the above work by economic but also non-volatile. With the recent tech- considering the locality in the run-time access pat- nology breakthroughs in both capacity and reliability, terns. Kwoun, et al. [7] proposed to periodically move more and more (embedded) system applications now live data among blocks so that blocks have more even deploy flash memory for their storage systems. For life-times. Although researchers have proposed ex- example, the manufacturing systems in factories must cellent garbage-collection policies, there is little work be able to tolerate severe vibration, which may cause done in providing a deterministic performance guar- damage to hard-disks. As a result, flash memory is a antee for flash-memory storage systems. It has been good choice for such systems. Flash memory is also shown that garbage collection could impose almost 40 suitable to portable devices, which have limited en- seconds of blocking time on time-critical tasks with- ergy source from batteries, to lengthen their operating out proper management [10]. This paper is motivated time. by the needs of a predictable garbage collection mecha- There are two major issues for the flash memory nism to provide a deterministic garbage collection per- storage system implementation: the nature of flash formance in time-critical systems. memory in (1) write-once and bulk-erasing, and (2) the endurance issue. Because flash memory is write- The rest of this paper is organized as follows: Sec- once, the existing data cannot be overwritten directly. tion 2 introduces the system architecture of a real-time Instead, the newer version of the data will be writ- flash memory storage system. Section 3 presents our ten to some available space elsewhere. The old ver- real-time block-recycling policy, free-page replenish- sion of the data is then invalidated and considered ment mechanism, and the supports for non-real-time as “dead”. The latest version of the data is consid- tasks. The proposed mechanism is analyzed in Sec- ered as “live”. As a result, physical locations of data tion 4. Section 5 summarizes the experimental results. change from time to time because of the adoption of Section 6 is the conclusion and the future research.
  2. 2. Non- Non- Page Read Page Write Block Erase RT task RT task RT RT 512 bytes 512 bytes 16K bytes RT RT RT Task Task Task Garbage Collector Garbage Collector Performance(µs) 348 909 1,881 Time-Sharing Scheduler Symbol tr tw te Real-Time Scheduler fread, fwrite Table 1: Performance of the NAND Flash Memory File System ioctl_copy ioctl_erase (controlled through the ISA bus) sector (page) read and write Block Device Emulation (FTL) read, write, erase for more details. We propose to support real-time tasks and reason- Flash Memory Driver control signals able services to non-real-time tasks through a real- Flash Memory time scheduler and a time-sharing scheduler, respec- tively. A real-time garbage collector is initiated for Figure 1: System Architecture. each real-time task which might write data to flash memory to reclaim free pages for the corresponding task. Real-time garbage collectors interact directly 2 System Architecture with the FTL though the special designed control ser- This section proposes the system architecture of a vices. The control services includes ioctl erase which real-time flash-memory storage system, as shown in performs the block erasing, and ioctl copy which per- Figure 1. We selected NAND flash to realize our stor- forms the atomic copying. Each atomic copy, that age system. There are two major architectures in flash consists of a page read then a page write, is designed memory design: NOR flash and NAND flash. NOR to realize the live-page copying in garbage collection. flash is a kind of EEPROM, and NAND flash is de- The read-then-write operation is non-preemptible in signed for data storage. We study NAND flash in order to prevent the possibility of any race condition. this paper because it has a better price/capacity ra- Note that there is no garbage collection tasks created tio, compared to NOR flash. for the non-real-time tasks. Instead, FTL must re- A NAND flash memory chip is partitioned into claim an enough number of free pages for non-real- blocks, where each block has a fixed number of pages, time tasks, similar to typical flash-memory storage and each page is of a fixed size byte array. Due to systems, so that reasonable performance is provided. the hardware architecture, data on a flash are written in a unit of one page, and the erase is performed in 3 Real-Time Garbage Collection a unit of one block. A page can be written only if Mechanism it is erased, and a block erase will clear up all data 3.1 Characteristics of Flash Memory Op- on its pages. Logically, a page that contains live data erations is called a “live page”, and a “dead page” contains A typical flash memory chip supports three kinds dead (invalidated) data. Furthermore, each block has of operations: Page read, page write, and block erase. an individual limit (theoretically equivalent in the be- The performance of the three operations measured on ginning) on the number of erase cycles. A worn-out a real prototype is listed in Table 1. The prototype block will suffer from frequent write errors. The block has an average read and write speed at 1.4MB/sec and size of a typical NAND flash is 16KB, and the page 540KB/sec, respectively, if no garbage collection ac- size is 512B. The endurance of each block is usually tivities are involved. Block erases take a much longer 1,000,000 under the current technology. time, compared to others. As we mentioned in Section There are several different approaches in manag- 2, we adopted a special (ioctl copy) control services to ing flash memory for storage systems. We must point process live-page copying for real-time garbage collec- out that NAND flash prefers the block device emula- tion. The atomic copy operation copies data from one tion approach because NAND flash is a block-oriented page to another by the specified page addresses. From medium (Note that a NAND flash page fits a disk sec- the perspective of FTL’s clients, the FTL is capable tor in size). “Flash memory Translation Layer” (FTL) in processing page read, page write, block erase, and is introduced to emulate a block device for flash mem- atomic copy. ory so that applications could have transparent stor- Flash memory is a programmed-I/O device. In age service over flash memory. There is an alternative other words, flash memory operations are very CPU- approach which builds a native flash memory file sys- consuming. As a result, the CPU is fully occupied tem over NOR flash, we refer interested readers to [5] during the execution of each flash memory operation.
  3. 3. t t + pi sym- Description bol W R W R W W Λ the total number of live pages currently on flash ∆ the total number of dead pages currently on flash Legend: Φ the total number of free pages currently on flash π the number of pages in each block A preempible CPU computation region. Θ the total number of pages on flash (=Λ + ∆ + Φ) R A non-preemptible read request to flash memory. α the constant lower-bound of the number of W A non-preemptible write request to flash memory. reclaimed free pages after each block recycling ρ the total number of tokens in system ρf ree the number of unallocated tokens in system Figure 2: A task Ti which reads and writes flash mem- ρTi the number of tokens given to Ti ory. ρGi the number of tokens given to Gi ρnr the number of tokens given to non-real-time tasks On the other hand, flash memory operations are non- Table 2: Symbol Definitions. preemptible since the operations can not be inter- leaved with one another. We can treat flash memory file system meta-data may introduce unbounded delay. operations as non-preemptible portions in tasks. As They proposed a meta-data pre-fetch scheme which shown in Table 1, the longest non-preemptible opera- loads necessary meta-data into RAM to provide a de- tion among them is a block erase, which takes 1,881 terministic behavior in accessing meta-data. In this µs to complete. paper, we assume that each real-time task has a de- 3.2 Real-Time Task Model terministic behavior in accessing meta-data so that we This section illustrate how a real-time task repre- can focus on the real-time support issue for the flash sents its requirements for flash memory storage service memory storage systems. and timing constraint: Each periodic real-time task Ti 3.3 Real-Time Garbage Collection is defined as a triple (cTi , pTi , wTi ), where cTi , pTi , and In this section, we shall present our real-time wTi denote the CPU requirements, the period, and garbage collection mechanism. We shall first propose the maximum number of its page writes per period, the idea of real-time garbage collectors and the free- respectively. The CPU requirements ci consists of the page replenishment mechanism for garbage collection. CPU computation time and the flash memory operat- We will then present a block-recycling policy to choose ing time: Suppose that task Ti wishes to use CPU for appropriate blocks to recycle. ccpu µs, read i pages from flash memory, and write j Ti pages to flash memory in each period. The CPU re- 3.3.1 Real-Time Garbage Collectors quirements cTi can be calculated as ccpu + i ∗ tw + j ∗ tr Ti For each real-time task Ti which may write to flash (tw and tr can be found in Table 1). If Ti does not memory (wTi > 0), we propose to create a correspond- wish to write to flash memory, it may set wTi = 0. ing real-time garbage collector Gi . Gi recycles a block Figure 2 illustrates that a periodic real-time task Ti in each period, and it should reclaim and supply Ti issues one page read and two page writes in each pe- with enough free pages. Let a constant α denote a riod (note that the order of the read and writes does lower-bound on the number of free pages that can be not need to be fixed in this paper). reclaimed for each block recycling (we will show how to For example, in a manufacturing system, a task T1 derive the lower-bound in Section 4.1). Let π denote might periodically fetch the control codes from files, the number of pages per block. Given a real-time task drive the mechanical gadgets, sample the reading from Ti = (cTi , pTi , wTi ) with wTi > 0, the corresponding the sensors, and then update the status to some spec- real-time garbage collector is created as follows: ified files. Suppose that ccpu = 1ms and pT1 = 20ms. T1 Let T1 wish to read a 2K-sized fragment from a data cGi = (π − α) ∗ (tr + tw ) + te + ccpu file and write 256 bytes of machine status back to a Gi status file. Assume that the status file already exists, wTi . (1) and we have one-page spontaneous file system meta- pTi / α , if wTi > α pGi = pTi ∗ α , otherwise data to read and one-page meta-data to write. As a wTi result, task T1 can be defined as (1000+(1+ 2048 )∗tr + 512 256 256 (1 + 512 ) ∗ tw , 20 ∗ 1000, 1 + 512 ) = (4558, 20000, 2). The CPU demand cGi consists of at most (π − α) Note that the time granularity is 1µs. live-page copyings, a block erase, and computation re- Molano, et al. [2] pointed out that the access of quirements ccpu . The worst-case CPU requirements of Gi
  4. 4. Ti() Gi() - 30 -3 -3 -3 -3 -3 { { T1 T2 if(beginningOfMetaPeriod()) if( ( Φ − ρ ) ≥ α ) wT1=30 wT2=3 { { // give up the extra tokens // create tokens from the +16 // existing free pages G1 +16 +16 G2 ( wTi * σ i ) ρ Gi = ρ Gi + α ; ρ = ρ + α ; x = ρTi − ; pTi } (a) (b) ρ Ti = ρ Ti − x ; ρ = ρ − x ; else } { ....... /* job of Ti */ // Recycle a block recycleBlock(); Figure 3: Creation of the Corresponding Real-Time } // Note that Φ ,ρ ,ρGi change } Garbage Collectors. // Supply Ti with α tokens ρTi = ρTi +α ;ρGi = ρGi −α ; // Give up the residual tokens z = ρGi −( π −α );ρGi = ρGi − z;ρ = ρ − z; all real-time garbage collectors are the same since α } is a constant lower-bound. Obviously, the estimation is conservative, and Gi might not consume the entire Figure 4: The Algorithm of the Free-Page Replenish- CPU requirements in each period. The period pGi is ment Mechanism. set under the guideline in supplying Ti with enough free pages. The length of its period depends on how fast Ti consumes free pages. We let Gi and Ti arrive Gi , respectively. The symbols for token counters are the system at the same time. summarized in Table 2. Figure 3 provides two examples for real-time Initially, Ti and Gi are given (wTi ∗ σi /pTi ) and garbage collectors, with the system parameter α = 16. (π − α) tokens, respectively. It is to prevent Ti and Gi In Figure 3.(a), because wT1 = 30, pG1 is set as one from being blocked in their first meta-period. Dur- half of pT1 so that G1 can reclaim 32 free pages for T1 ing their executions, Ti and Gi are more like a pair in each pT1 . As astute readers may point out, more of consumer and producer for tokens. Gi creates and free pages may be unnecessarily reclaimed. We will provides tokens to Ti in each of its period pGi because address this issue in the next section. In Figure 3.(b), of its reclaimed free pages in block-recycling. The re- because wT2 = 3, pG2 is set to five-times of pT2 so plenishment of tokens are enough for Ti to write pages that 16 pages are reclaimed for T2 in each pG2 . The in each of its meta-period σi . When Ti consumes a to- reclaimed free pages are enough for T1 and T2 to con- ken, both ρTi and ρ are decreased by one. When Gi sume in each pT1 and pG2 , respectively. We define the reclaim a free page and, thus, create a token, ρGi and meta-period σi of Ti and Gi as pTi if pTi ≥ pGi ; oth- ρ are both increased by one. Basically, by the end erwise, σi is equal to pGi . In the above examples, σ1 of each period (of Gi ), Gi provides Ti the created to- = pT1 and σ2 = pG2 . The meta-period will be used in kens in the period. When Ti and Gi terminates, their the later sections. tokens must be returned to the system. 3.3.2 Free-Page Replenishment Mechanism The above replenishment mechanism has some The consuming and reclaiming of free pages dur- problems: First, real-time tasks might consume free ing run-time usually do not behave in the worst-case. pages slower than what they declared. Secondly, the In this section, we provide a token-based “free-page real-time garbage collectors might reclaim too many replenishment mechanism” to manage the free pages free pages than what we estimate in the worst-case. and their reclaiming more efficiently and flexibly: Since Gi always replenish Ti with at least α created Consider a real-time task Ti = (cTi , pTi , wTi ) with tokens, tokens would gradually accumulate at Ti . The wTi > 0. Initially, Ti is given (wi ∗ σi /pTi ) tokens, and other problem is that Gi might unnecessarily reclaim one token is good for executing one page-write (Please free pages even though there are already sufficient free see Section 3.3.1 for the definition of σi ). We require pages in the system. Here we propose to refine the ba- that each (real-time) task could not write any page if sic replenishment mechanism as follows: it does not own any token. Note that a token does In the beginning of each meta-period σi , Ti gives not correspond to any specific free page in the sys- up ρTi − (wTi ∗ σi /pTi ) tokens and decreases ρTi and ρ tem. Several counters of tokens are maintained: ρinit by the same number because those tokens are beyond denotes the total number of available tokens when sys- the needs of Ti . In the beginning of each period of tem starts. ρ denotes the total number of tokens that Gi , Gi also checks up if (Φ − ρ) ≥ α, where a variable are currently in system (regardless of whether they are Φ denotes the total number of free pages currently in allocated and not). ρf ree denotes the number of unal- the system. If the condition holds, then Gi takes α located tokens currently in system, ρTi and ρGi denote free pages from the system, and ρGi and ρ is incre- the numbers of tokens currently given to task Ti and mented by α, instead of actually performing a block
  5. 5. recycling. Otherwise (i.e., (Φ − ρ) < α), Gi initiates tively, when the block-recycling policy is activated. As a block recycling to reclaim free pages and create to- shown in Equation 3, the worst-case performance of kens. Suppose that Gi now has y tokens (regardless the block-recycling policy can be controlled by bound- of whether they are done by a block erasing or a gift ing Φ and Λ. from the system) and then gives α tokens to Ti . Gi Because it is not necessary to perform garbage col- might give up y − α − (π − α) = y − π tokens be- lection if there are already sufficient free pages in sys- cause they are beyond its needs, where (π − α) is the tem. The block-recycling policy could be activated number of pages needed for live-page copyings (done only when the number of free pages is less than a by Gi ). Figure 4 provides the algorithm of the refined threshold value (a bound for Φ). Furthermore, in free page replenishment mechanism. Again, readers order to bound the number of live pages in system could find all symbols used in this section in Table 2. (i.e., Λ), we propose to reduce the total number of the 3.3.3 A Block-Recycling Policy FTL-emulated LBA’s (where LBA stands for “Logical A block-recycling policy should make a decision on Block Address” of a block device). For example, we which blocks should be erased during garbage collec- can emulate a 48MB block device by using a 64MB tion. The previous two sections propose the idea of NAND flash memory. We argue that this intuitive ap- real-time garbage collectors and the free-page replen- proach is affordable since the capacity of flash mem- ishment mechanism for garbage collection. The only ory chip is increasing very rapidly. 1 For example, thing missing is a policy for the free-page replenish- suppose that the block-recycling policy is always ac- ment mechanism to choose appropriate blocks for re- tivated when Φ ≤ 100, and there are 32 pages in a cycling. The purpose of this section is to propose a block, the worst-case performance of the greedy pol- greedy policy that delivers a predictable performance icy is 32 ∗ 131,072−100−98,304 = 8. The major chal- 131,072 on garbage collection: lenge in guaranteeing the worst-case performance is We propose to recycle a block which has the largest how to properly set a bound for Φ for each block recy- number of dead pages. Obviously the worst-case in the cling since the number of free pages in system might number of free-page reclaiming happens when all dead grow and shrink from time to time. We shall further pages are evenly distributed among all blocks in the discuss this issue in Section 4.1. system. The number of free pages reclaimed in the 3.4 Supports for Non-Real-Time Tasks worst-case after a block recycling can be denoted as: The objective of the system design aims at the si- multaneous supports for both real-time and non-real- ∆ π∗ , (2) time tasks. In this section, we shall extend the token- Θ based free-page replenishment mechanism in the pre- where π, ∆, and Θ denote the number of pages per vious section to supply free pages for non-real-time block, the total number of dead pages on flash, and the tasks. A non-real-time wear-leveller will then be pro- total number of pages on flash, respectively. We refer posed to resolve the endurance issue. readers to Table 2 for the definitions of all symbols 3.4.1 Free-Page Replenishment for Non-Real- used in this paper. Time Tasks Formula 2 denotes the worst-case performance of Different from real-time tasks, let all non-real-time the greedy policy, which is proportional to the ratio of tasks share a collection of tokens, denoted by ρnr . the numbers of dead pages and pages on flash memory. π tokens are given to non-real-time tasks (ρnr = π) That is an interesting observation because we can not initially for the purpose of live-page copying during obtain a high garbage collection performance by sim- garbage collection for non-real-time tasks. Before a ply increasing the flash memory capacity. We must non-real-time task issues a page write, it should check emphasize that π and Θ are constant in a system. A up if ρnr > π. If the condition holds, the page write proper greedy policy which properly manages ∆ would is executed, and one token is consumed (ρnr and ρ result in a better lower-bound on the number of free are decreased by one). Otherwise (i.e., ρnr ≤ π), the pages reclaimed in a block recycling. system must replenish itself with tokens for non-real- Because Θ = ∆ + Φ + Λ. Equation 2 could be time tasks. The token creation for non-real-time tasks re-written as follows: is similar to the strategy adopted by real-time garbage Θ−Φ−Λ collectors: If Φ ≤ ρ, a block recycling is initiated to π∗ , (3) reclaim free pages, and tokens are created. If Φ > ρ, Θ where π and Θ are constants, and Φ and Λ de- 1 The capacity of a single NAND flash memory chip had note the numbers of free pages and live pages, respec- grown to 256MB when we wrote this paper.
  6. 6. then there might not be any needs for any block recy- block recycling. As shown in Section 3.3.3, guaran- cling. teeing the performance of the block-recycling policy is 3.4.2 A Non-Real-Time Wear-Leveller not trivial since the number of free pages on flash, i.e., As we mentioned in Section 2, each block has an in- Φ, may grow or shrink from time to time. In this sec- dividual limit on the number of erase cycles. In order tion, we will derive the relationship between α and Λ to lengthen the overall life-time of flash memory, the (the number of live pages on the flash) as a sufficient management software should try to wear each block condition for engineers to guarantee the performance as evenly as possible. It is so-called ”wear-levelling”. of the block-recycling policy for a specified value of α. In the past work, researchers tend to resolve the wear- As indicated by Equation 2, the system must satisfy levelling issue in the block-recycling policy. A typi- the following condition: cal wear-levelling-aware block-recycling policy might sometimes recycle the block which has the least erase ∆ ∗ π ≥ α. (4) count, regardless of how many free pages can be re- Θ claimed. Note that the erase count denotes the num- Since ∆ = (Θ − Λ − Φ), and ( x ≥ y) implies ber of erases had been performed on the block so far. (x > y − 1) if y is an integer, Equation 4 can be re- This approach is not suitable to a time-critical appli- written as follows: cation, where predictability is an important issue. We propose to use non-real-time tasks for wear- (α − 1) Φ < Θ ∗ (1 − ) − Λ. (5) levelling and separate the wear-levelling policy from π the block-recycling policy. We could create a non-real- The formula shows that the number of free pages, time wear-leveller, which sequentially scans a given i.e., Φ, must be controlled under Equation 5 if α has number of blocks to see if any block has a relatively to be guaranteed (under the utilization of the flash, small erase count. We say that a block has a rela- i.e., Λ). Because real-time garbage collectors would tively small erase count if the count is less than the initiate block recyclings only if Φ − ρ < α (please see average by a given number, e.g., 2. One of the main Section 3.3.2), the largest possible value of Φ on each reasons why some blocks have relatively small erase block recycling is (α − 1) + ρ. Now let Φ = (α − 1) + ρ, counts is because the blocks contain many live-cold Equation 5 can be again re-written as follows: data (which had not been invalidated / updated for a long period of time). Those blocks obviously are not (α − 1) ρ < Θ ∗ (1 − ) − Λ − α + 1. (6) good candidates for recycling, hence they are seldom π erased. When the wear-leveller finds a block with a rel- atively small erase count, the wear-leveller first copy Note that given a specified bound α and the (even live pages from the block to some other blocks and conservatively estimated) utilization Λ of the flash, the then sleeps for a while. As the wear-leveller repeats total number of tokens ρ in system should be con- the scanning and live-page copying, dead pages on the trolled under Equation 6. In the next paragraphs, target blocks would gradually increase. As a result, we will first derive ρmax which is the largest possible those blocks will be selected by the block-recycling number of tokens in the system under the proposed policy sooner or later. The idea is to ”defrost” the garbage collection mechanism. Because ρmax should blocks by moving live-cold data away. also satisfy Equation 6, we will derive the relationship between α and Λ (the number of live pages on the 4 System Analysis flash) as a sufficient condition for engineers to guaran- The purpose of this section is to provide a sufficient tee the performance of the block-recycling policy for a condition to guarantee the worst-case performance of specified value of α. Note that ρ may go up and down, the block-recycling policy. As a result, we can jus- as shown in Section 3.3.1. tify the performance of the free-page replenishment We shall consider the case which has the maximum mechanism. An admission control strategy is also in- number of tokens accumulated: The case occurs when troduced. non-real-time garbage collection recycled a block with- 4.1 The Performance of the Block- out any live-page copying. As a result, non-real-time Recycling Policy tasks could hold up to 2 ∗ π tokens, where π tokens are The performance guarantees of real-time garbage reclaimed by the block recycling, and the other π to- collectors and the free-page replenishment mechanism kens are reserved for live-page copying. Now consider are based on a constant α, i.e., a lower-bound on the real-time garbage collection: Within a meta-period σi number of free pages that can be reclaimed for each of Ti (and Gi ), assume that Ti consumes none of its
  7. 7. wT 1 = 30, ρ T 1 = 30 -0 T1 4.2 Admission Control The admission control of a real-time task must con- ρ G 1 = 16 +16 +32 G1 sider its resource requirements and the impacts on other tasks. The purpose of this section is to derive a wT 2 = 3, ρ T 2 = 15 -0 -0 -0 -0 -0 formula for admission control. Given a set of real-time T2 tasks {T1 , T2 , ..., Tn } and their corresponding real-time ρ G 2 = 16 garbage collectors {G1 , G2 , ..., Gn }, let cTi , pTi , and +32 G2 wTi denote the CPU requirements, the period, and the t maximum number of page writes per period of task Ti , respectively. Suppose that cGi = pGi = 0 when Figure 5: An Instant of Having the Largest Number wTi = 0 (it is because no real-time garbage collector of Tokens in System (Based on the Example in Figure is needed for Ti ). Since the focus of this paper is on 3.) flash-memory storage systems, the admission control of the entire real-time task set must consider whether the system could give enough tokens to all tasks ini- reserved (wTi ∗ σi /pTi ) tokens. On the other hand, tially. The verification could be done by evaluating Gi replenishes Ti with (α ∗ σi /pGi ) tokens. In the best the following formula: case, Gi recycles a block without any live-page copying in the last period of Gi within the meta-period. As a n wTi ∗ σi result, Ti could hold up to (wTi ∗ σi /pTi ) + (α ∗ σi /pGi ) + (π − α) ≤ ρf ree . (9) tokens, and Gi could hold up to 2 ∗ (π − α) tokens, i=1 pTi where (π − α) tokens are the extra tokens created, and the other (π − α) tokens are reserved for live page Note that ρf ree = ρinit when the system starts, copying. If all real-time tasks Ti and their correspond- where ρinit is the number of initially reserved tokens ing real-time garbage collectors Gi behave in the same in the system. Beside the test, engineers should also way, as described above. Figure 5 illustrates the above verify the relationship between α and Λ by means of example based on the task set in Figure 3. The largest Equation 8. These verifications can be done in a linear potential number of tokens in system ρmax could be time. as follows, where ρf ree is the number of un-allocated Other than the above tests, readers might want to tokens: verify the schedulability of real-time tasks in terms of CPU utilization. Suppose that the earliest dead- line first algorithm (EDF) [3] is adopted to sched- n wTi ∗ σi α ∗ σi ule all real-time tasks and their corresponding real- ρmax = ρf ree + 2π + + + 2(π − α) . pTi pGi time garbage collectors. Since all flash-memory op- i=1 (7) erations are non-preemptive, and block erasing is the most time-consuming operation, the schedulability of When ρ = ρmax , we have the following equation by the real-time task set can be verified by the following combining Equation 6 and Equation 7: formula, provided that other system overheads are ig- norable: n wTi ∗ σi α ∗ σi ρf ree + 2π + + + 2(π − α) < te n cTi cG i=1 pTi p Gi + + i ≤ 1, (10) min p() i=1 pTi pGi (α − 1) Θ ∗ (1 − )−Λ−α+1 (8) where min p() denotes the minimum period of all π Ti and Gi , and te is the duration of a block erase. Equation 8 shows the relationship between α and Because every real-time task might be blocked by a Λ. We must emphasize that this equation serves as block erase issued by either real-time garbage collec- as a sufficient condition for engineers to guarantee the tion or non-real-time garbage collection, as a result, performance of the block-recycling policy for a speci- the longest blocking time of every task is the same, fied value of α, provided the number of live pages in i.e., te . We could derive Equation 10 from Theorem the system Λ, i.e., the utilization of the flash. We shall 2 in T. P. Baker’s work [9], which presents a suffi- address the admission control of real-time tasks in the cient condition of the schedulability of tasks which next section, based on α. have non-preemptible portions. We must emphasize
  8. 8. Task ccpu page page c p c/p 100,000 reads writes(w) T1 3ms 4 2 6,354 µs 20ms 0.32 10,000 T2 5ms 2 5 8,738 µs 200ms 0.05 Blocking Time (ms) 1,000 Table 3: The Basic Simulation Workload 100 Task ccpu page page c p c/p 10 reads writes T1 3ms 4 2 6,354 µs 20ms 0.32 1 T2 5ms 2 5 8,738 µs 200ms 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 G1 10 µs 16 16 22,003 µs 160ms 0.14 Number of Garbage Collection Instances Invoked So Far (K) G2 10 µs 16 16 22,003 µs 600ms 0.04 Table 4: The Simulation Workload for Evaluating The Figure 6: The Blocking Time Imposed on each Page Real-Time Garbage Collection Mechanism . (α = 16, Write by a Non-Real-Time Garbage Collection In- Capacity Utilization = 50%) stance. that the above formula only intends to deliver the idea basically followed the well-known cost−benef it block- of blocking time, due to flash-memory operations. recycling policy [1], but it might recycle a block which had an erase count less than the average erase count 5 Performance Evaluation by 2 in order to perform wear-levelling. Note that In this section, we compare the behaviors of a sys- garbage collection was activated on demand. We de- tem prototype with or without the proposed real-time fined that an instance of garbage collection consisted garbage collection mechanism. We also show the effec- of all of the activities which started when the garbage tiveness of the proposed wear-levelling method, which collection was invoked and ended when the garbage is executed as a non-real-time service. collection returned control to the blocked page write. 5.1 Simulation Workloads We measured the blocking time imposed on a page A series of experiments were done over a real sys- write by each instance of garbage collection. The re- tem prototype. A set of tasks was created to emulate sults in Figure 6 show that the blocking time could the workload of a manufacturing system which stored even reach 21.209 seconds in the worst-case. The all files on flash memory to avoid vibration from dam- lengthy and un-predictable blocking time was mainly aging the system. The basic workload consisted of caused by wear-levelling. As astute readers may no- two real-time tasks T1 and T2 and one non-real-time tice, an instance of garbage collection might consist of task. T1 and T2 sequentially read the machine control recycling several blocks consecutively, because a block files, did some computations, and updated their own without dead pages might be recycled due to wear- (small) status files. The non-real-time task emulated a levelling. The block recycling-policy continued recy- file downloader, which downloaded the machine con- cling blocks until at least one free page was reclaimed. trol files continuously over a local-area-network and We also observed that if the real-time tasks were then wrote the downloaded file contents onto flash ever been blocked due to an insufficient number of memory. The flash memory used in the experiments tokens, under the adoption of the proposed real-time was a 16MB NAND flash [6]. The block size was garbage collection mechanism. The same basic con- 16KB, and the page size was 512B (π = 32). The figurations were used in this experiment. The corre- traces of T1 , T2 , and the non-real-time file downloader sponding real-time garbage collectors were created ac- were synthesized to emulate the necessary requests for cording to the method described in Section 3.3.1, and the file system. The basic simulation workload is de- the workload is summarized in Table 4. T1 and T2 tailed in Table 3. The duration of each experiment generated 12,000 and 3,000 page writes, respectively, was 20 minutes. in the 20-minute experiment. The results are shown 5.2 The Evaluation of Garbage Collection in Figure 7. We observed no blocking time being im- Mechanisms posed on page writes of T1 and T2 (as we expected). In order to observe the behavior of a non-real-time 5.3 Effectiveness of the Wear-Levelling garbage collection mechanism in a time-critical sys- Method tem, we evaluated a system which adopted a non-real- The second part of experiments evaluated the the time garbage collection mechanism under the basic effectiveness of wear-levelling, which was evaluated in workload and a 50% flash memory capacity utiliza- terms of the standard deviation of the erase counts of tion. The non-real-time garbage collection mechanism all blocks. A lower value indicated that all blocks were
  9. 9. 1.00 25 Blocking Time (ms) 50% Standard Deviation of Erase Counts T1 20 60% T2 70% 50%, no wear-levelling of All Blocks 15 50%, non-real-time 10 0.00 0 20 40 60 80 100 120 5 Page Writes Processed So Far (K) 0 0 2 4 6 8 10 12 14 16 18 20 22 24 Number of Erases processed so far (K) Figure 7: The Blocking Time Imposed on each Page Write of a Real-Time Task by Waiting for Tokens (Un- der the Real-Time Garbage Collection Mechanism). Figure 8: Effectiveness of the Wear-Levelling Method. References erased more evenly. The objective of wear-levelling is [1] A. Kawaguchi, S. Nishioka, and H. Motoda,“A to maintain a small value of the standard deviation. Flash Memory based File System,” Proceedings The non-real-time wear-leveller was configured as of the USENIX Technical Conference, 1995. follows: It performed live-page copyings whenever the [2] A. Molano, R. Rajkumar, and K. Juvva, “Dy- erase count of a block was less than the average erase namic Disk Bandwidth Management and Meta- count by 2. The wear-leveller slept for 50ms between data Pre-fetching in a Real-Time Filesystem,” every two consecutive live-page copyings. Since the Proceedings of the 10th Euromicro Workshop on past work showed that the overheads of garbage col- Real-Time Systems , 1998. lection highly depend on the flash-memory capacity utilization [1, 8, 4], we evaluated the experiments un- [3] C. L. Liu and J. W. Layland, “Scheduling Al- der different capacity utilizations: 50%, 60% and 70%. gorithms for Multiprogramming in a Hard Real- Time Environment,” Journal of the ACM , 1973. As the results shown in Figure 8, the non-real-time wear-leveller gradually levelled the erase count of each [4] F. Douglis, R. Caceres, F. Kaashoek, K. Li, block. The results also pointed out that the effective- B. Marsh, and J.A. Tauber, “Storage Alterna- ness of wear-levelling was better when the capacity tives for Mobile Computers,” Proceedings of the utilization was low, since the system was not heavily USENIX Operating System Design and Imple- loaded by the overheads of garbage collection. The mentation, 1934. standard deviation increased very rapidly when wear- [5] Journaling Flash File System, http://sources. levelling was disabled. We also observed that the the standard deviation was stringently controlled under [6] “K9F2808U0B 16Mb*8 NAND Flash Memory the non-real-time block-recycling policy. Data Sheet,” Samsung Electronics Company. [7] K. Han-Joon, and L. Sang-goo, “A New Flash Memory Management for Flash Storage System,” 6 Conclusion Proceedings of the Computer Software and Appli- This paper is motivated by the needs of a garbage cations Conference, 1999. collection mechanism which is capable of providing [8] M. L. Chiang, C. H. Paul, and R. C. Chang, a deterministic performance for time-critical systems. “Manage flash memory in personal communi- We propose a real-time garbage collection mechanism cate devices,” Proceedings of IEEE International with a guaranteed performance. The endurance issue Symposium on Consumer Electronics, 1997. is also resolved by the proposing of a wear-levelling [9] T. P. Baker, “A Stack-Based Resource Allocation method. We demonstrate the performance of the pro- Policy for Real-Time Process,” IEEE 11th Real- posed methodology in terms of a system prototype. Time System Symposium, Dec 4-7, 1990. However, there are some issues not addressed in this work: We should observe the overheads and perfor- [10] Vipin Malik, “JFFS2 is Broken,” Mailing List of mance of the proposed mechanism when the system is Memory Technology Device (MTD) Subsystem highly stressed. Furthermore, we shall extend the pro- for Linux, June 28th 2001. posed mechanism to support time-critical applications over a NOR flash-based storage system.