slides

387 views
319 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
387
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Thanks for chair’s introduction. This research is motivated by the endurance problem of flash-memory storage systems. Our goal is to propose an efficient static wear leveling mechanism to enhance the endurance of flash memory. And this work cooperates with Jen-Wei Hsieh and Tei-Wei Kuo
  • Here comes the outline of my presentation: First I will give a brief introduction to flash memory. Then introduce the system architecture of flash-memory storage systems and the motivation of this research. Then I will introduce the proposed mechanism and evaluate its performance. Finally, give a brief conslusion
  • Small size, non-volatility, shock-resistance, low-power consumption, lost cost Becuase System on a Chip and hybrid devices are becoming more and more popular,
  • Because each cell of flash memory can not overwritten before being erased.
  • Memory Technology Device driver is to provide primitive functions such as reads, writes and erases. Flash Translation Layer driver is needed in the system for address translation and garbage collection. Flash memory is usually accessed by embedded systems as a raw medium (for example, SmartMedia and Memory Stick) or indirectly through a block-oriented device (for example, CompactFlash and Disk on Module). In other words, the management of flash memory is carried out by either software on a host system or hardware circuits and firmware inside its device. Our proposed mechanism shall be implemented in flash translation layer.
  • There is an entry for each page in the translation table
  • As the larger capacity of flash memory adapts MLC flash memory, where MLC is a multi-level cell flash memory and each cell can store more than one bit information because of its low cost. It can endure less erase cycles. Comparatively, SLC is a cell that can store only one bit information and has higher erase cycle but more expensive.
  • If we overwrite data to the same physical place of flash memory, like the way to hard disks. The blocks store the meta-data of file systems or frequently updated data have very large erase cycles after a period of time and tend to be worn-out easily. However, blocks with static data has very limited erase cycles. Blocks store File Allocation Table (FAT) if FAT file system is deployed Master File Table (MFT) if NTFS file systems
  • If data are written to free blocks and the system reclaims blocks of invalid data. Then the block erases can be distributed to “free” area or “dynamic” data area.
  • If data are written to free blocks and If the system not only reclaims blocks with invalid data but also reclaims blocks with valid data, then block erases can be evenly distributed to every block of the flash memory. So that the endurance of flash memory can be extended.
  • Find the block with least erase count. Some heuristic approach erases a block when the garbage collector finds the erase count of the block is over a given threshold.
  • In order to minimize the overhead, the proposed mechanism should be inserted into existing management schemes with limited modifications
  • Our approach only communicate with the Cleaner so that the only thing needs to modify is to add one interface function for the SW Leveler to communicate with the Cleaner. To prevent any cold data from staying at any block for a long period of time. BET is to remember which block has been erased in a pre-determined time frame. The BET is a bit array and each bit corresponds to a set of contiguous blocks. Whenever a block is erased, the corresponding bit is set. Therefore, if a bit is not set for a period of time, it means that its corresponding blocks are blocks of cold data and they should be moved out.
  • Note that the sequential scanning of blocks in the selection of block sets for static wear leveling is very efficitive in the implementation. We surmise that the design is close to that in a random selection policy in reality because cold data could virtually exist in any block in the physical address space of the flash memory.
  • FTL: large K  more data were moved at a time by SWL with a large k value so that better mixing of hot and no-hot data of the trace let blocks be erased more evenly. The first failure time of FTL seems long, it could be substantially shortened when flash memory is adopted in designes with a higher access frequency.
  • For NAND flash memory, electrodes of cells on the same row are connected together. Thank you for your time
  • Because floating gate of each flash memory cell is thiner and thiner, the electrons is easier to be
  • slides

    1. 1. Endurance Enhancement of Flash-Memory Storage Systems: An Efficient Static Wear Leveling Design Yuan-Hao Chang , Jen-Wei Hsieh, and Tei-Wei Kuo Embedded Systems and Wireless Networking Laboratory Dept. of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>System Architecture </li></ul><ul><li>Motivation </li></ul><ul><li>An Efficient Static Wear Leveling Mechanism </li></ul><ul><li>Performance Evaluation </li></ul><ul><li>Conclusion </li></ul>
    3. 3. Introduction - Why Flash Memory <ul><li>Diversified Application Domains </li></ul><ul><ul><li>Portable Storage Devices </li></ul></ul><ul><ul><li>Consumer Electronics </li></ul></ul><ul><ul><li>Industrial Applications </li></ul></ul><ul><li>SoC and Hybrid Devices </li></ul><ul><ul><li>Critical System Components </li></ul></ul>
    4. 4. Introduction - Characteristics of Flash Memory … … (2KB+64 Byte) (2KB+64 Byte) … … (2KB+64 Byte) (2KB+64 Byte) … … (2KB+64 Byte) (2KB+64 Byte) Page : basic write-operation unit. 0 63 … … 0 511 … … … … Block : basic erase-operation unit. 64MB SLC Flash Memory
    5. 5. System Architecture * FTL : Flash Translation Layer, MTD : Memory Technology Device SD, xD, MemoryStick , SmartMedia CompactFlash
    6. 6. Policies – FTL <ul><li>FTL adopts a page-level address translation mechanism . </li></ul><ul><ul><li>Main problem: Large memory space requirement </li></ul></ul>
    7. 7. Policies – NFTL <ul><li>Each logical address under NFTL is divided into a virtual block address and a block offset . </li></ul><ul><ul><li>e.g., LBA=1011 => virtual block address (VBA) = 1011 / 8 = 126 and block offset = 1011 % 8 = 3 </li></ul></ul>. . . (9,23) . . . NFTL Address Translation Table (in main-memory) Free Free Free Used Free Free Free Free Used Used Used Free Free Free Free Free A Primary Block Address = 9 A Replacement Block Address = 23 If the page has been used Write to the first free page Write data to LBA=1011 VBA=126 Block Offset=3
    8. 8. Outline <ul><li>Introduction </li></ul><ul><li>System Architecture </li></ul><ul><li>Motivation </li></ul><ul><li>An Efficient Static Wear Leveling Mechanism </li></ul><ul><li>Performance Evaluation </li></ul><ul><li>Conclusion </li></ul>
    9. 9. Motivation <ul><li>A limited bound on erase cycles </li></ul><ul><ul><li>SLC : 100,000 </li></ul></ul><ul><ul><li>MLC x2 : 10,000 </li></ul></ul><ul><li>Applications with frequent access </li></ul><ul><ul><li>Disk cache (e.g., Robson ) </li></ul></ul><ul><ul><li>Solid State Disks (SSD) </li></ul></ul>Solid State Disk
    10. 10. Motivation – Why Static Wear Leveling: No Wear Leveling Number of Write/Erase Cycles per PBA (75% Static Data) Physical Block Addresses (PBA) Erase Cycles 0 20 40 60 80 100 0 1000 2000 3000 4000 5000
    11. 11. Motivation – Why Static Wear Leveling: Dynamic Wear Leveling Number of Write/Erase Cycles per PBA (75% Static Data) Physical Block Addresses (PBA) Erase Cycles 0 20 40 60 80 100 0 1000 2000 3000 4000 5000
    12. 12. Motivation – Why Static Wear Leveling: Perfect Static Wear Leveling Number of Write/Erase Cycles per PBA (75% Static Data) Physical Block Addresses (PBA) Erase Cycles 0 20 40 60 80 100 0 1000 2000 3000 4000 5000
    13. 13. Motivation – Why Static Wear Leveling is So Difficult? <ul><li>An intuitive SWL </li></ul><ul><li>Problems: </li></ul><ul><ul><li>Block erase </li></ul></ul><ul><ul><li>Live-page copying </li></ul></ul><ul><ul><li>RAM space </li></ul></ul><ul><ul><li>CPU time </li></ul></ul>Flash memory counter : free block : dead block : block contains (some) valid data : index in the selection of a victim block : index to the selected free block 5 5 4 5 5 4 4 5 4 5 5 4 4 5 5 4 Update data in block 3 Write new data to block 4 GC starts  find a victim block 5 Update block 15 GC starts 5 5 5 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    14. 14. Motivation – Comparison Perfect SWL Intuitive SWL Erase Cycles No Wear Leveling Dynamic Wear Leveling Physical Block Addresses (PBA) Erase Cycles 0 20 40 60 80 100 0 1000 2000 3000 4000 5000 Physical Block Addresses (PBA) 0 20 40 60 80 100 0 1000 2000 3000 4000 5000 0 20 40 60 80 100 0 1000 2000 3000 4000 5000 0 20 40 60 80 100 0 1000 2000 3000 4000 5000
    15. 15. Outline <ul><li>Introduction </li></ul><ul><li>System Architecture </li></ul><ul><li>Motivation </li></ul><ul><li>An Efficient Static Wear Leveling Mechanism </li></ul><ul><li>Performance Evaluation </li></ul><ul><li>Conclusion </li></ul>
    16. 16. Design Considerations in Static Wear Leveling <ul><li>Scalability Issues </li></ul><ul><ul><li>The limitation on the </li></ul></ul><ul><ul><li>RAM space </li></ul></ul><ul><ul><li>The limitation on </li></ul></ul><ul><ul><li>computing power </li></ul></ul><ul><li>Compatibility Issue </li></ul><ul><ul><li>Compatibility with FTL and NFTL </li></ul></ul>
    17. 17. An Efficient Static Wear Leveling Mechanism <ul><li>A modular design for compatibility considerations </li></ul><ul><li>A SWL mechanism </li></ul><ul><ul><li>Block Erasing Table (BET) </li></ul></ul><ul><ul><ul><li>bit flags </li></ul></ul></ul><ul><ul><li>SW Leveler </li></ul></ul><ul><ul><ul><li>SWL-Procedure </li></ul></ul></ul><ul><ul><ul><li>SWL-Update </li></ul></ul></ul>
    18. 18. The Block Erasing Table (BET) <ul><li>A bit-array: Each bit is for 2 k consecutive blocks. </li></ul><ul><ul><li>Small k – in favor of hot-cold data separation </li></ul></ul><ul><ul><li>Large k – in favor of small RAM space </li></ul></ul>Flash BET 1 1 1 1 k=0 k=2 e cnt =0 f cnt =0 e cnt =0 f cnt =0 e cnt = 1 f cnt = 1 e cnt = 2 f cnt = 2 e cnt = 3 f cnt =2 e cnt = 1 f cnt = 1 e cnt = 2 f cnt = 2 e cnt = 3 f cnt =2 e cnt = 4 f cnt =2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 : a block that has been erased in the current resetting interval : an index to a block that the Cleaner wants to erase f cnt : the number of 1’s in the BET e cnt : the total number of block erases done since the BET is reset
    19. 19. <ul><li>An unevenness level ( e cnt / f cnt ) >= T  Triggering of the SW Leveler </li></ul><ul><li>Resetting of BET when all flags are set. </li></ul>The SW Leveler e cnt =1998 f cnt =2 : A block that has been erased in the current resetting interval 1 1 k=2 : An index to a block that the Cleaner wants to erase f cnt : The number of 1’s in the BET e cnt : the total number of block erases since the BET is erased T : A threshold, T=1000 in this example : An index in the selection of a block set e cnt = 1999 f cnt =2 e cnt = 2000 f cnt =2 2000 / 2 = 1000 >= 1000 (E cnt / f cnt >= T) 1 : An index that SW Leveler triggers the Cleaner to do garbage collection The Cleaner is triggered to 1. Copy valid data of selected block set to free area, 2. Erase block in the selected block set, and 3. Inform the Allocator to update the address mapping between LBA and PBA After a period of time, the total erase count reaches 2998 . e cnt = 2004 f cnt = 3 e cnt =2998 f cnt =3 e cnt = 2999 f cnt =3 e cnt = 3000 f cnt =3 3000 / 3 = 1000 >= 1000 (E cnt / f cnt >= T) 1 e cnt = 3004 f cnt = 4 After a period of time, the total erase count reaches 3999 . e cnt =3999 f cnt =4 e cnt = 4000 f cnt =4 4000 / 4 = 1000>=1000 (e cnt / f cnt >=1000) , but all flags in BET are 1  reset BET Reset to a randomly selected block set (flag) e cnt = 0 f cnt = 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    20. 20. Main-Memory Requirements MLC x2 (1 page = 2 KB, 1 block=128 pages) 512MB 1GB 2GB 4GB 8GB k=0 256B 512B 1024B 2048B 4096B k=1 128B 256B 512B 1024B 2048B k=2 64B 128B 256B 512B 1024B k=3 32B 64B 128B 256B 512B
    21. 21. Outline <ul><li>Introduction </li></ul><ul><li>System Architecture </li></ul><ul><li>Motivation </li></ul><ul><li>An Efficient Static Wear Leveling Mechanism </li></ul><ul><li>Performance Evaluation </li></ul><ul><li>Conclusion </li></ul>
    22. 22. Performance Evaluation <ul><li>Metrics </li></ul><ul><ul><li>Endurance Enhancement (First Failure Time) </li></ul></ul><ul><ul><li>Additional Overheads (Extra Block Erases) </li></ul></ul><ul><li>Experiment Setup </li></ul><ul><ul><li>The FTL Layer </li></ul></ul><ul><ul><ul><li>FTL (Flash Translation Layer protocol) </li></ul></ul></ul><ul><ul><ul><li>NFTL (NAND Flash Translation Layer protocol) </li></ul></ul></ul><ul><ul><li>Traces: </li></ul></ul><ul><ul><ul><li>A mobile PC with a 20GB hard disk </li></ul></ul></ul><ul><ul><ul><li>Duration: One month. </li></ul></ul></ul><ul><ul><ul><li>Daily activities </li></ul></ul></ul><ul><ul><ul><li>The average number of Operations: 1.82 writes/sec, 1.97 reads/sec </li></ul></ul></ul><ul><ul><ul><li>The percentage of accessed LBA’s: 36.62% </li></ul></ul></ul><ul><ul><li>Flash Memory: 1GB MLCx2 </li></ul></ul>
    23. 23. Endurance Improvement - The First Failure Time When k=3 and T=100, the endurance is improved by 100.2% . When k=0 and T=100, the endurance is improved by 87.5% .
    24. 24. Additional Overhead - Extra Block Erases Extra block erases is less than 3%. Extra block erases is less than 2%.
    25. 25. Conclusion <ul><li>An Efficient Static Wear Leveling Mechanism </li></ul><ul><ul><li>Small Memory Space Requirement </li></ul></ul><ul><ul><ul><li>Less than 4KB for 8GB flash memory </li></ul></ul></ul><ul><ul><li>Efficient Implementation </li></ul></ul><ul><ul><ul><li>An adjustable house-keeping data structure </li></ul></ul></ul><ul><ul><ul><li>An cyclic queue scanning algorithm </li></ul></ul></ul><ul><li>Performance Evaluation </li></ul><ul><ul><li>Improvement Ratio of the Endurance: </li></ul></ul><ul><ul><ul><li>The ratio is more than 50% with proper settings to T and K </li></ul></ul></ul><ul><ul><li>Extra Overhead: </li></ul></ul><ul><ul><ul><li>It is less than 3% of extra block erase with proper settings </li></ul></ul></ul><ul><ul><ul><li>There is limited overhead in live-page copyings </li></ul></ul></ul>
    26. 26. Conclusion <ul><li>Future work </li></ul><ul><ul><li>Endurance and Reliability Problems due to Manufacturing and Capacity Improvements </li></ul></ul><ul><ul><ul><li>Read/Write Disturbance Problems </li></ul></ul></ul><ul><ul><li>System Components </li></ul></ul><ul><ul><ul><li>Memory Hierarchy, Devices, Adaptors, etc. </li></ul></ul></ul><ul><ul><li>Benchmark Designs </li></ul></ul><ul><ul><ul><li>Targets on Different Designs </li></ul></ul></ul>Selected cell I DS Control Gate Drain Source
    27. 27. <ul><li>~ Thank You ~ </li></ul>

    ×