Cost-Efficient Memory Architecture Design of NAND Flash Memory


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cost-Efficient Memory Architecture Design of NAND Flash Memory

  1. 1. Cost-Efficient Memory Architecture Design of NAND Flash Memory Embedded Systems Chanik Park et all. Proceedings of the ICCD 2003
  2. 2. Introduction <ul><li>Objective </li></ul><ul><ul><li>Cost efficient NAND flash memory architecture for XIP (execute-in-place) </li></ul></ul><ul><li>Why NAND flash memory? </li></ul><ul><ul><li>NAND offers </li></ul></ul><ul><ul><ul><li>extremely high cell densities </li></ul></ul></ul><ul><ul><ul><li>high capacity </li></ul></ul></ul><ul><ul><ul><li>fast write and erase rates </li></ul></ul></ul><ul><ul><ul><li>low cost </li></ul></ul></ul>
  3. 3. Backgrounds <ul><li>NAND vs. NOR </li></ul>8KB – 64KB 64KB – 128KB Erase block Low High Price Data Storage Code Storage Ideal Usage Sequential Access Random Access Access Mode I/O (CLE, ALE, OLE signal toggle) Full Memory Interface Interface Over 10 times more than NOR Less than 10% of NAND Life span Fast erase/write/read (long initial latency/fast serial read) Very slow erase Slow write/Fast read Performance No Yes XIP 16MB-512MB 1MB-32MB Capacity NAND NOR
  4. 4. Backgrounds <ul><li>Memory Device Characteristics </li></ul><ul><ul><li>Mobile SDRAM </li></ul></ul><ul><ul><ul><li>Good performance & price, but high power consumption </li></ul></ul></ul><ul><ul><li>Low-power SRAM & Fast SRAM </li></ul></ul><ul><ul><ul><li>Very good performance, but high cost </li></ul></ul></ul><ul><ul><li>NOR & NAND </li></ul></ul><ul><ul><ul><li>Cost, Power Consumption, Read/Write/Erase performance </li></ul></ul></ul>v
  5. 5. Backgrounds <ul><li>Mobile Embedded System Architecture </li></ul><ul><ul><li>Voice-centric 2G </li></ul></ul><ul><ul><ul><li>Appropriate for low-end phones, which require medium performance & cost </li></ul></ul></ul><ul><ul><ul><li>Cannot accommodate the multi-media applications’ needs of high performance & huge storage </li></ul></ul></ul>
  6. 6. Backgrounds <ul><ul><li>Data-centric 2.5G </li></ul></ul><ul><ul><ul><li>NOR for code storage & NAND for data storage </li></ul></ul></ul><ul><ul><ul><li>Yet insufficient to 3G real-time applications </li></ul></ul></ul><ul><ul><ul><li>Increased number of components increases system cost </li></ul></ul></ul><ul><ul><li>3G & SmartPhones </li></ul></ul><ul><ul><ul><li>NAND flash for code/data storage </li></ul></ul></ul><ul><ul><ul><li>Use Shadowing Technique </li></ul></ul></ul><ul><ul><ul><ul><li>Code image is copied into systems’ RAM for execution during boot-time </li></ul></ul></ul></ul><ul><ul><ul><ul><li>High performance but slow boot process & high power consumption (SDRAM) </li></ul></ul></ul></ul><ul><ul><ul><li>Adoption of demand paging is needed </li></ul></ul></ul><ul><ul><ul><ul><li>But it cannot be applicable low or mid-end system </li></ul></ul></ul></ul><ul><ul><li>Needs NAND-XIP itself ! </li></ul></ul>
  7. 7. NAND XIP <ul><li>NAND flash characteristics </li></ul><ul><ul><li>Structure </li></ul></ul><ul><ul><ul><li>Fixed number of blocks & 32 pages in each blocks </li></ul></ul></ul><ul><ul><ul><li>Each pages consists of 512bytes data & 16 bytes spare data for auxiliary information (bad block id. or ECC data) </li></ul></ul></ul><ul><ul><li>Read/Write/Erase </li></ul></ul><ul><ul><ul><li>Read/write is performed in page unit </li></ul></ul></ul><ul><ul><ul><li>Erase is performed in block unit </li></ul></ul></ul><ul><ul><li>Reliability </li></ul></ul><ul><ul><ul><li>Bad block management </li></ul></ul></ul><ul><ul><ul><li>EDC/ECC for bit-flipping </li></ul></ul></ul>
  8. 8. NAND XIP <ul><li>Basic Implementation </li></ul><ul><ul><li>NAND XIP is implemented using </li></ul></ul><ul><ul><ul><li>Small size of buffer </li></ul></ul></ul><ul><ul><ul><li>I/O interface – Memory interface conversion </li></ul></ul></ul><ul><ul><li>Limitation </li></ul></ul><ul><ul><ul><li>Poor average access performance </li></ul></ul></ul><ul><ul><ul><li>Currently basic XIP area is limited to boot code </li></ul></ul></ul>
  9. 9. NAND XIP <ul><li>Obstacles of general NAND XIP </li></ul><ul><ul><li>Average memory access time </li></ul></ul><ul><ul><ul><li>Average access time of NAND flash should be comparable to that of other memories </li></ul></ul></ul><ul><ul><li>Worst case handling </li></ul></ul><ul><ul><ul><li>Cache miss handling is critical problem in real time environment </li></ul></ul></ul><ul><ul><li>Bad block management </li></ul></ul><ul><ul><ul><li>Must hide memory space discontinuity caused by bad block </li></ul></ul></ul><ul><li>Approach of this paper : Intelligent Caching </li></ul><ul><ul><li>Highest cache hit ratio by Priority-based Caching </li></ul></ul><ul><ul><li>Reduced access latency by Profile-based Prefetching technique </li></ul></ul><ul><ul><li>Bad block management using PAT (page address translation) </li></ul></ul>
  10. 10. Intelligent Caching Architecture <ul><li>Profile-guided static analysis </li></ul><ul><ul><li>Profiling process gathers following information statically </li></ul></ul><ul><ul><ul><li>Access pattern, </li></ul></ul></ul><ul><ul><ul><li>Prefetching information </li></ul></ul></ul><ul><ul><li>Divide code pages into </li></ul></ul><ul><ul><ul><li>High priority : OS code, system libraries, real-time applications </li></ul></ul></ul><ul><ul><ul><li>Mid priority : Normal application code </li></ul></ul></ul><ul><ul><ul><li>Low priority : sequential or boot strapping code </li></ul></ul></ul><ul><ul><li>Page priority & Prefetching information is stored in spare area, and used by cache controller </li></ul></ul>
  11. 11. Intelligent Caching Architecture <ul><li>Victim Cache </li></ul><ul><ul><li>Small size of fully associated cache </li></ul></ul><ul><ul><li>blocks replaced from main cache are stored (swapping-operation) </li></ul></ul><ul><ul><li>Prevent unnecessary conflict miss </li></ul></ul><ul><li>PAT(page address translation) </li></ul><ul><ul><li>Bad block management </li></ul></ul><ul><ul><ul><li>Remaps pages in bad blocks to pages in good blocks </li></ul></ul></ul><ul><ul><li>Assist low priority pages management </li></ul></ul><ul><ul><ul><li>by remapping requested pages to swapped pages in system memory </li></ul></ul></ul>
  12. 12. Intelligent Caching Architecture
  13. 13. Intelligent Caching Architecture <ul><li>Scenario </li></ul><ul><ul><li>Reqeust A </li></ul></ul><ul><ul><ul><li>A is cached in main cache </li></ul></ul></ul><ul><ul><li>Request B (conflict with A) </li></ul></ul><ul><ul><ul><li>B is moved to system memory </li></ul></ul></ul><ul><ul><ul><li>PAT is updated to remap C </li></ul></ul></ul><ul><ul><li>Request C (conflict with A) </li></ul></ul><ul><ul><ul><li>C replaces A in main cache </li></ul></ul></ul><ul><ul><ul><li>A is swapped to victim cache </li></ul></ul></ul>
  14. 14. Experimental Setup <ul><li>Prototype NAND XIP board </li></ul><ul><ul><li>32MB NAND flash </li></ul></ul><ul><ul><li>256KB main cache </li></ul></ul><ul><ul><li>4KB victim cache </li></ul></ul><ul><ul><li>10KB SRAM for Tag data </li></ul></ul><ul><li>NAND Miss penalty (one page) </li></ul><ul><ul><li>35us : Latency(10us) + page_read(512 * 50ns) </li></ul></ul>
  15. 15. Experimental Results <ul><li>Average Memory Access Time </li></ul><ul><ul><li>SDRAM shadowing </li></ul></ul><ul><ul><li>NAND XIP(priority) : 32KB cache </li></ul></ul><ul><ul><li>NOR XIP </li></ul></ul><ul><ul><li>NAND XIP(basic) : 32KB cache </li></ul></ul>
  16. 16. Experimental Results <ul><li>Energy Consumption </li></ul><ul><ul><li>NOR XIP </li></ul></ul><ul><ul><li>NAND XIP(priority) : 32KB cache </li></ul></ul><ul><ul><li>NAND XIP(basic) : 32KB cache </li></ul></ul><ul><ul><li>SDRAM shadowing </li></ul></ul>
  17. 17. Experimental Results <ul><li>Booting Time & Cost </li></ul><ul><ul><li>NAND XIP shows reasonable booting time with low cost </li></ul></ul>
  18. 18. Conclusion <ul><li>NAND XIP is feasible </li></ul><ul><ul><li>Experiment shows the feasibility of proposed architecture in real-life mobile embedded environment </li></ul></ul><ul><ul><li>By applying highly optimized caching techniques geared to the specific features of NAND flash and its application </li></ul></ul><ul><li>Yet, more system-wide approach is needed </li></ul><ul><ul><li>Worst case handling is still not easy </li></ul></ul><ul><ul><li>A new task scheduling algorithm, considering NAND flash operations is helpful </li></ul></ul>