• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Reliability Of Solid State Drives 2008
 

Reliability Of Solid State Drives 2008

on

  • 2,949 views

A talk on SSD reliability given at the conference in 2008

A talk on SSD reliability given at the conference in 2008

Statistics

Views

Total Views
2,949
Views on SlideShare
2,937
Embed Views
12

Actions

Likes
3
Downloads
0
Comments
0

3 Embeds 12

http://www.slideshare.net 9
http://www.linkedin.com 2
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Reliability Of Solid State Drives 2008 Reliability Of Solid State Drives 2008 Presentation Transcript

    • Andrei Khurshudov Sr. Director SSD Q&R Seagate Technology October 20, 2008 Symposium on Magnetic Storage Tribology and Reliability Miami, Florida October 20, 2008 10/27/2008 1
    • SSD – In One Page SSD ≡ Solid State Drive SSD is a storage device ◦ using solid state memory as components instead of heads and disks ◦ appearing to the user as a drive similar to a hard disk drive (HDD) SSD uses non-volatile memory (NAND Flash) or volatile semiconductor memory (RAM) with a battery Current SSD products utilize either SLC (single-level cell) or MLC (multi-level cell) NAND Flash SSD benefits: read performance, higher reliability, low power consumption SSD challenges: cost, product reliability over life, and write performance Andrei Khurshudov Seagate Technology 10/27/2008 2 October 20, 2008
    • Today and Tomorrow of SSD Today’s total revenue ~ $400 M Projected 2011 revenue ~ $5 B Today’s unit shipments ~ 4M units ◦ Dominated by the industrial applications ◦ Dominated by capacities <1 GB Projected 2011 unit shipments ~ 50M units ◦ Dominated by shipments to portable PCs ◦ Dominated by capacities from 64 GB to 128 GB The Total Cost of Ownership (TCO) is expected to drive the transition from HDDs to SSDs ◦ Conclusion: there is no need for the complete price parity at equivalent capacity points | Source: IDC Andrei Khurshudov Seagate Technology 10/27/2008 3 October 20, 2008
    • Basic Flash Operation Flash stores data by trapping charge at the floating gate Direct access to data: Program (write) a “page” (2KB or 4 KB + ECC bytes) ◦ Read a page ◦ Erase the smallest unit is a block (64,128, or more pages) ◦ Over-write = Erase (Block) + Write (page) ◦ Program / Erase operations: ◦ Forces electrons in the substrate to tunnel through the oxide layer to be transported to and trapped on the floating gate (“0”) ◦ Forces electrons back to the substrate (“1”) Read operation: ◦ Apply voltage to the control gate and sense the current through the inversion channel: “1” if there is a current flow “0” if there is no current flow Andrei Khurshudov Seagate Technology 10/27/2008 4 October 20, 2008
    • Program and Erase Cycle 20 V 0V Control Gate Control Gate Dielectric Dielectric Floating Gate Floating Gate Float Float Float Float eeeeeeeeeee Gate Oxide Gate Oxide eeeeeeeeeee Source Drain Source Drain 0V 20 V Equivalent to “data write” in HDD Equivalent to “data erase” in HDD Electrons are moved from the substrate Electrons are moved from the floating and trapped in the floating gate gate into the substrate Programming is done by “pages” Erasures are done by “blocks” Results in a logical “0” Results in a logical “1” Uses Fowler-Nordheim tunneling Uses Fowler-Nordheim tunneling Andrei Khurshudov Seagate Technology 10/27/2008 5 October 20, 2008
    • Flash Technology Trends | Source: J. Cooke, Micron technology | Source: Samsung Future roadmap for NAND charge storage technology: Scaling down and increasing complexity 10X reduction in reliability that needs to be compensated for by other means Transition from SLC (single-level cells) to MLC (multi-level cells) will represent a significant challenge to Flash reliability Not just writes but reads have a degrading effect on the flash data retention Andrei Khurshudov Seagate Technology 10/27/2008 6 October 20, 2008
    • Quality Assurance: HDD vs. SSD SSD HDD Immature Industry: Non-uniform, Mature Industry: Mature Tests Inconsistent Development and Qualification Development and Qualification Tests – very similar across the industry Tests – inconsistent across the industry Test conditions – consistent across the Test conditions – inconsistent across the industry industry Test sample size and environments - very Test sample size, environments, and failure similar across the industry criteria - inconsistent across the industry Firmware testing, validation, and issue Firmware testing, validation, and issue handling – years of experience handling – little experience Acceleration factors: Acceleration factors: Temperature – similar Temperature – understood Usage – unclear Usage – understood Voltage – not well defined Voltage – understood Reliability demonstration – standard RDT tests Reliability demonstration – inconsistent across & standard data interpretation the industry Reliability Focus Reliability Focus Endurance (wearout) Head-disk interface Data retention Handling robustness Read and write disturb Wear-leveling algorithms Andrei Khurshudov Seagate Technology 10/27/2008 7 October 20, 2008
    • Major Failure Modes of NAND Flash • Flash-specific failure modes include: • Program disturb: other cells than those being programmed receive elevated voltage. Can be on the page that is not supposed to be programmed. Erase will return cells to the “normal” state • Read disturb: within the block being read but on pages not being read. Erase will return cells to the “normal” state • Data retention: charge loss or gain occurs in the cell over time. Erase will return cells to the “normal” state • Endurance (Wear-out): cell fails due to charge trapped in the dielectric layer. Not recoverable by erase. Programmed Cell after P/E Cycling Other SSD failure modes: • Control Gate • Handling damage • EOS/ESD Dielectric Floating Gate Gate Oxide, SiO2 eeeeeeeeeee • Firmware / ASIC failures eeeee Source Drain • Other failures P-substrate Andrei Khurshudov Seagate Technology October10/27/2008 8 20, 2008
    • SSD Endurance Electrical effects: P/Emax Electrical effects: --Faster programming Faster programming due to trapping charges Failure rate, % due to trapping charges inside in dielectric instead inside in dielectric instead of the FG of the FG --Slower erasure because ß=1 Slower erasure because the trapped charges are the trapped charges are harder to remove than ß>1 ß<1 harder to remove than those in FG; those in FG; True P/E cycles Time GB written Program/Erase (P/E) cycles cause charge to be trapped in the dielectric layer This causes a permanent shift in cell characteristics, which is not recovered by erase Observed as failed program or erase status In most cases, data could be recovered from the failed block Blocks that fail should be retired (marked as bad and no longer used) Andrei Khurshudov Seagate Technology 10/27/2008 9 October 20, 2008
    • SSD Endurance: Major Factors Stress: Number of P/E cycles External P/E cycles (host write data rate) Internal Write multiplication External data entropy (block size distribution application specific) Internal data handling (data buffering, Flash architecture, etc.) Wear-leveling efficiency (write uniformity across Flash cells) Operating environment Temperature (could both stress and help) Strength: Flash Endurance robustness Device ECC power Design redundancies or excess capacities Bad block identification and data re-assign mechanism Andrei Khurshudov Seagate Technology 10/27/2008 10 October 20, 2008
    • Endurance: SLC vs. MLC Multi-level cells use different charge levels to store two or more bits in one cell Read/Write design margins (and the gaps between the Vt levels) are much smaller for MLC resulting in lower endurance | Source: W. Hutsell, Texas Memory Systems Transition to MLC would represent a significant reliability challenge Andrei Khurshudov Seagate Technology 10/27/2008 11 October 20, 2008
    • SSD Data Retention Programmed Cell Programmed Cell after P/E Cycling Control Gate Control Gate Dielectric Dielectric Floating Gate Floating Gate Gate Oxide eeeeeeeeeee eeeeeeeeeee eeeee Gate Oxide Source Drain Source Drain P-substrate P-substrate Programmed Cell after long NOP Storage Programmed Cell after P/E Cycling and long NOP Storage Control Gate Control Gate Dielectric Dielectric Floating Gate Floating Gate Gate Oxide e e e e e e e e e e e e e Gate Oxide e e e e e Source Drain Source Drain e e e e e e e e e P-substrate P-substrate Non-operating storage causes charge to leak from the floating gate P/E cycling lead to even faster charge dissipation and eventual data loss Andrei Khurshudov Seagate Technology 10/27/2008 12 October 20, 2008
    • Data Retention vs. Time and P/E cycles P/E cycling shortens data retention | Source: Samsung | Source: Jim Cooke, Micron No P/E cycling impact on endurance Strong endurance dependence on P/E cycling Newer technologies shortens data retention Exercising flash reduces its long-term data retention This problem gets worse as the Flash scales down (60 nm 4x nm) and increases in complexity (SLC MLC) Andrei Khurshudov Seagate Technology 10/27/2008 13 October 20, 2008
    • Understand and Overcome Fundamental Technology limitations Write Endurance (max. program/erase cycles) Degrades with device scaling 100k for SLC NAND, 10k for MLC-2b, 1k for MLC-3b, 100 for MLC-4b Data Retention Degrades with device scaling Depends on temperature and P/E cycling 10 year retention @ up to 10% P/E cycles, 1 year retention @ 100% P/E cycles Read disturb Degrades with device scaling 1M for SLC NAND, 100k for MLC-2b Write multiplication Block erasure might lead to many additional internal writes for every host write Mitigate Flash Limitations with Advanced Reliability & Test Technologies Static and dynamic wear leveling to maximize life of the device Write reduction solutions Deploying increased ECC power SSD-specific Test and Qualification process (CERT, DMT, RDT, ORT, etc.) Andrei Khurshudov Seagate Technology 10/27/2008 14 October 20, 2008
    • Predictive Life Modeling SSD reliability modeling could potentially be more accurate than that for HDD. However, … Failure mechanisms are highly inter-independent and supplier- specific, which makes things difficult Flash Component Quality and Reliability Superb quality control is required to compensate for high lot & part variability in high-volume environment Component reliability correlation to a system and to the field & integration needs to be established Standardization of the most critical tests and methodologies Need to establish common language and definitions Andrei Khurshudov Seagate Technology 10/27/2008 15 October 20, 2008
    • SSD future is bright and promising but dependent on several critical areas, including reliability HDD to SSD transition rate will be a strong function of the total cost of ownership ◦ (TCO) Reliability plays a critical role in reducing the TCO ◦ SSD technology scaling is expected to have a negative impact on reliability ◦ SSD reliability efforts should focus on the following major areas: Endurance ◦ Data retention ◦ Read / Program disturb ◦ Reliability enhancing technologies (wear leveling, ECC, etc.) ◦ SSD test standardization is required: No “apple-to-apple” comparison will be possible otherwise ◦ TCO is difficult to estimate without having standard tests ◦ Andrei Khurshudov Seagate Technology 10/27/2008 16 October 20, 2008
    • Backup 10/27/2008 17
    • • JEDEC JC64.8 was formed to focus on developing and coordinating SSD standards activity JC-64.8 Co-Chairs: Alvin Cox, Seagate, Scott Graham, Micron Participating JC64 Editorial TG Editorial TG Companies: Embedded Memory Storage and Removable Memory Cards Roadmap TG Embedded Memory Storage and Removable Memory Cards Roadmap TG Intel, Microsoft, Micron, Samsung, Toshiba, Hitachi, SaS TG SaS TG LSI, Sandisk, JC64.1 JC64.2 JC64.3 JC64.8 Seagate, Dell, HP, Tyco, STEC, Enabling TG Enabling TG Electrical Mechanical Host Controller SSD Electrical Mechanical Host Controller SSD Marvell, Nvidia and others EJTG: eMMC MJTG: MMC UFS TG MMCA EJTG: eMMC MJTG: MMC UFS TG UFS TG UFS TG UFS TG UFS TG • JC-64.8 Subcommittee Scope: Solid State Drives Define/propose standards for solid state drives used for embedded or removable memory storage leveraging existing storage infrastructure… Include… quality, reliability, durability methods and procedures that are not included in the interface standards… Andrei Khurshudov Seagate Technology 10/27/2008 18 October 20, 2008