Advertisement
Advertisement

More Related Content

More from 奈良先端大 情報科学研究科(20)

Advertisement

Reliability of ECC-based Memory Architectures with Online Self-repair Capabilities

  1. Reliability of ECC-Based Memory Architectures with Online Self-Repair Capabilities Gian Mayuga1, Yuta Yamato1, Tomokazu Yoneda1, Yasuo Sato2, Michiko Inoue1 1Nara Institute of Science and Technology, Ikoma, Nara, Japan 2Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
  2. Outline •  Research Background •  Proposed ECC-Based Memory Architecture •  Proposed Online Repair Strategy •  Reliability Evaluation •  Results •  Conclusion 12/19/14 IEICE DC 2
  3. Issues on Embedded Memory •  Memory takes up most area in large-scale SoC’s •  Post-manufacturing failures highly occur in memory •  Periodic field-level test and repair, and error correction required to maintain reliabilitySource: Semico Research Corp http://www.semico.com/content/semico-systems-chip-%E2%80%93-braver-new-world 20 38 44 53 58 65 69 0 20 40 60 80 100 1999 2000 2005 2008* 2011* 2014* 2017* PercentofArea % area new logic % area reused logic % area memory SoC Area Partitioning 12/19/14 IEICE DC 3
  4. Memory Errors and Mechanisms for Repair and Correction •  Conventionally, errors treated independently –  Hard error • Repair –  Soft error • Correction •  Recently, Combined Approach is used –  Hard error also Corrected –  Errors in memory word can be classified under: • Uncorrectable error • Correctable error Repair Hard Errors Soft Errors Alpha Ray Cosmic Ray Row/Column Faults Array Faults Correction … Given m faulty bits in a word, if correction capable up to m-bits, then word can be corrected Memory word 12/19/14 IEICE DC 4 Random Bit Faults
  5. Issue on Memory Word Reliability (1/2) •  Memory word with Uncorrectable error –  Must be Repaired •  Memory word with Correctable error –  Corrected –  More vulnerable if word already has faulty bit 512/19/14 IEICE DC Word with 2-bit Errors For example, using Single-Error Correction Spare Word Word with 1-bit Error Corrected Word ✓ Word with 1-bit Error Word failure
  6. Issue on Memory Word Reliability (2/2) •  Memory word with Uncorrectable error –  Must be repaired •  Memory word with Correctable error –  Can be corrected –  Can also be repaired 612/19/14 IEICE DC Repair Hard Errors Soft Errors Alpha Ray Cosmic Ray Row/Column Faults Array Faults Correction Random Bit Faults
  7. Combined Approach of Repair and Correction [2] T.H. Wu, et al. A Memory Yield Improvement Scheme Combining Built- In Self-Repair and Error Correction Codes. ITC 2012. [5] C.L. Su, et al. An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement. DFT 2005. [6] M. Nicolaidis, P. Papavramidou. Transparent BIST for ECC-Based Memory Repair, IOLTS 2013. Reference Target Error Time Test Performed Remark ECC Repair [2] Correctable Uncorrectable Manufacturing Enhances Yield [5] Correctable Uncorrectable In-Field Transparent BIST w/ ECC [6] Soft Hard In-Field Repairs all Hard errors Proposed Correctable Uncorrectable and Correctable In-Field Enhances Reliability 12/19/14 IEICE DC 7 Comparison of Studies using Combined Approach
  8. Proposed Online Repair Scheme To enhance reliability, erroneous words are repaired with spare words as long as possible •  Both Uncorrectable error and Correctable error –  Repair (Address Remapping) •  Uncorrectable error is repaired •  Correctable error is repaired if spare space is available •  Remapped word with Correctable error is cancelled if needed –  Correction 12/19/14 IEICE DC 8
  9. Test, Repair and Correction in Field •  Test (mBIST) –  Test performed to identify errors •  Repair (mBISR) –  Use Spare Space to remap address of memory word –  Physical replacement performed in manufacturing test •  Correction (ECC) –  Use ECC to correct errors •  Scrubbing –  Re-write data to eliminate soft errors Processors Cache Cache Logic block Memory Logic block Test and Repair Sample SoC with Embedded Memory 12/19/14 IEICE DC 9 Correction
  10. Proposed Architecture and Strategy Overview •  Normal Mode •  ECC and Scrubbing Controller •  Test Mode •  BIST and Diagnosis CAM •  Remap CAM and Remap Controller –  Proposed Online Repair Scheme »  Remap CAM Strategy »  Cases for Remap CAM 1012/19/14 IEICE DC Proposed ECC-Based Memory Architecture
  11. Proposed ECC-Based Architecture Normal Mode Remap CAM enabled for remapping operation, and ECC/Scrubbing are in effect 12/19/14 IEICE DC 11 Remap CAM: If memory address stored, it is remapped to spare address Scrubbing Controller: Protection against soft error ECC: Protection against errors
  12. Proposed ECC-Based Architecture Test Mode Diagnosis CAM: Error information saved in Diagnosis CAM, error information includes no. of errors Remap Controller: Based on error classification, faulty words are remapped BIST: Performs test and determines error information Remap CAM: Where remapping information is stored 12/19/14 IEICE DC 12 Test mode performed when memory is idle
  13. Remap CAM Strategy Spare Available? No. of Hard Errors Action Remark Yes 1 Repair - ≥2 Repair - Limited ~ None 1 - Correction ≥2 Repair Cancel word with 1 error In this work, Single Error Correction ECC used 12/19/14 IEICE DC 13 Traditional Scheme Proposed Scheme No. of Hard Errors Action Remark 1 - Correction ≥2 Repair -
  14. Uses Two Counters: RC2F and RC1F Remap CAM Faulty Address RC2F Bottom Address Top Address RC1F Faulty Words with 2 or more errors Faulty Words with 1 error 12/19/14 IEICE DC 14 Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address RC2F and RC1F points at next address to be written to
  15. How Remap CAM works – Spare Available (1/2) Faulty Address Bottom Address Top Address RC1F Faulty Words with 2 or more errors Faulty Words with 1 error 12/19/14 IEICE DC 15 Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Write Faulty Address with ≥2 errors Faulty Address RC2FRC2F points at next address to be written to
  16. How Remap CAM works – Spare Available (2/2) Faulty Address RC2F Bottom Address Top Address Faulty Words with 2 or more errors Faulty Words with 1 error 12/19/14 IEICE DC 16 Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Write Faulty Address with 1 error Faulty Address RC1F RC1F points at next address to be written to
  17. How Remap CAM works – FULL (1/2) Faulty Address Bottom Address Top Address Faulty Words with 2 or more errors Faulty Words with 1 error 12/19/14 IEICE DC 17 Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Faulty Address Faulty Address Faulty Address Faulty Address Write Faulty Address with ≥2 errors RC2F RC1F Previous remapping cancelled
  18. How Remap CAM works – FULL (2/2) Faulty Address Bottom Address Top Address RC2F RC1F Faulty Words with 2 or more errors Faulty Words with 1 error 12/19/14 IEICE DC 18 Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Faulty Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Spare Address Faulty Address Faulty Address Faulty Address Faulty Address Write Faulty Address with 1 error Do nothing
  19. time Normal Mode Reliability of Proposed Architecture 0 scrubbing 1st Self-Test and Repair period ECC Normal Mode scrubbing ECC 2nd Self-Test and Repair period Normal Mode k-th Self-Test and Repair period scrubbing ECC Poisson distribution for hard error λh=10-11 ,10-10 Poisson distribution for soft error λs=10-7 Memory is reliable if there are no uncorrectable errors that are not repaired/corrected at any given time 12/19/14 IEICE DC 19 Self-test: March-like test that repeatedly perform write and read in cells Assume little chance of soft errors between read and the last write
  20. Reliability Evaluation •  Evaluation done using R •  Conditions: λh<<λs since hard errors occur less frequently •  Scrubbing period: every 6 minutes •  Self-test period: every 10 days •  Reliabilities observed up until 50 years 12/19/14 IEICE DC 20
  21. Remapping Scheme •  Proposed – Proposed Method •  Traditional – Repair only Uncorrectable word 2112/19/14 IEICE DC
  22. Reliability of proposed scheme vs traditional scheme λh=10-11 Hard errors do not occur frequently and reliability is expected to be near 1 Reliability Years 12/19/14 IEICE DC 22
  23. Reliability of proposed scheme vs traditional scheme λh=10-10 When hard errors occur more often, proposed scheme may use all the spare words, but still has better reliability than traditional scheme Reliability Years 12/19/14 IEICE DC 23
  24. Conclusion •  Given an ECC-based memory architecture, online repair scheme that repair uncorrectable words and possibly correctable word is proposed •  Novel memory reconfiguration using remap CAM is proposed •  Reliability is evaluated under Poisson distributions of hard errors and soft errors •  Reliability of proposed scheme is demonstrated to be effective, and extends memory lifetime 12/19/14 IEICE DC 24
Advertisement