Accelerating SSD Performance with HLNAND Roland Schuetz MOSAID Technologies Inc. David Won, INDILINX
Presentation Outline PC Architecture and the Solid State Drive HLNAND Introduction HLNAND Enhances SSD Performance Conclusions
PC System Architecture Where are the Bottlenecks? PC System Architecture Migration of System Memory Creating New Memory Sub-Architecture Utilize Upgraded PCIe link for New Storage Memory Sub-Architecture Most Memory Interfaces and System Interconnect Have Undergone Dramatic BW Upgrades Over the Years Bulk Storage Access BW Lags (Mechanical Latency)
Three categories: Enterprise Notebook / PC NetBook / Ultraportable Three cost models: >$100 non-NAND in Enterprise SSD’s ~$5.00 non-NAND BOM in Notebook SSD’s <$1.00 non-NAND BOM in NetBook / Ultraportable SSD’s SSD Market Segment Enterprise Notebook NetBook / Ultraportable Source: STEC Source: SanDisk Source: STEC
Enterprise application move huge amounts of data and are IO bound These include: File, web, transaction servers Multimedia editing systems Simulation servers/workstations Current Flash based enterprise applications based on conventional, 40 – 100 MBps sub-systems Enterprise Flash Storage Excellent Market for HLNAND SSDs Source: Violin 1010 Memory Appliance, Violin Memory, Inc.
Current SSD Offerings (1st Generation) Leading NAND, memory module, and specialized SSD manufacturers  offer conventional NAND flash based product Cost is pivotal for SSD adoption Little product and architectural differentiation 4 channel 4-way interleave Similar cost structure NAND flash constitutes the majority of the BOM Vertically integrated manufacturers have pricing advantage  Similar performance 30 ~ 100MB/s Read/Write performance No one competitor has performance lock on the market
How are Current SSDs Stacking Up? Source: Engadget
Current SSDs vs. HDDs
HLNAND – New High Speed NAND Standard HLNAND Unidirectional, point-to-point, daisy-chain cascade with programmable link width; 1- 8 bits Synchronous DDR signaling up to 800Mb/s/pin Each ring supports up to 255 devices with no bandwidth degradation Single CE per ring enables pin controller count reduction Low Power 1.8V I/O Conventional NAND 8 bit, bidirectional, multi-drop bus Asynchronous LVTTL signaling up to 40Mb/s/pin Speed degradation with more than 4 devices on bus Chip Enable (CE) signal required for each device Power hungry 3.3V I/O
HLNAND – New High Speed NAND Standard (cont’d) HLNAND New device features enhance performance and simplify controller and SSD design New, low-stress program scheme enables: Random page program Page-pair erase Multi-page & Multi-block erase Conventional NAND No new features to enhance flash performance or simplify controller and SSD design
PC Demands on System Storage Typical PC use is IO intensive -> frequent HDD/SSD access; Ex: creating slide presentation, virus scan, etc. Booting PC is very IO intensive since the OS must load a large amount of data from bulk storage to DRAM
IO Operation Example Can SSD Deliver “Instant Boot” Populate 1GB DRAM from Flash Based Bulk Storage Conventional Flash SSD with 60MB/s BW takes ~17sec Optimized conventional flash SSD with 95MB/s* takes ~10 sec *  Mtron SSD 32 GB: Performance with a Catch Patrick Schmid, Achim Roos,  November 21, 2007; SSD: Mtron MSDSATA6025032NA 10 Seconds is very observable time!
IO Operation Example Can SSD Deliver “Instant Boot” Populate 1GB DRAM from Flash Based Bulk Storage HLNAND HL1 SSD with 266MB/s BW takes ~3.8sec HLNAND HL2 SSD with 800MB/s BW takes ~1.3sec HLNAND based SSDs offer 260% - 450% and up to 800% IO rate improvement.  3.8 sec. not instant, but much closer
IOPS are Holly Grail in Enterprise Applications SSDs offer significantly higher I/O (inputs/outputs) per second Than HDDs: HLNAND will shine HDD (typical enterprise class) ~150 SSD, SATA (advertised by Sandisk) 7,000 SSD, 4G Fiber Channel (advertised by STEC) 45,000 HLNAND 300,000* Source: Sandisk, STEC, and Deutsche Bank estimates I O P S * Assume similar design to STEC SSD and multiply by media speed improvement; 45,000 * (266/40).Not including 4GFC saturation limit. Holy Grail of the Monastery of Xenophontos, Macedonian Heritage, 2000-2008
SSD Performance Comparison HLNAND-based vs. NAND-based SSD Controller for HLNAND HL1 Ring Host Interface Max. Media BW = 1066MB/s @x8b, 4 rings Total 96 signal pins excluding power: 24 signal pins per ring SSD Controller for NAND Channel 1 - Max. 25MB/s x8b Channel 2 - Max. 25MB/s x8b Channel 8 - Max. 25MB/s x8b Parallel Bus Host Interface Max. Media BW = 200MB/s @x8b, 8 channels Total 208 signal pins excluding power: 29 signal pins per channel, 8 CE# and 8 R/B# per channel 32 dies per ring 16 dies per channel, no termination 128GB SSD using 128 * 8Gb SLC Ring 1 - Max. 266MB/s @x8b Ring 1 - Max. 266MB/s @x8b
HLNAND SSD vs. NAND SSD 266 MB/s per ring 1 ring to saturate SATA1  (1.5Gbs, 188MB/s) 2 rings to saturate SATA2  (3Gbps, 375M/s) Reduced pin count; 24 ctrl/data pins per ring Expanded feature set reduces controller overhead Pwr/Media BW : 1.87mW/MB/s Max. 40MB/s per channel 4-5 channels to saturate SATA1 8-10 channels to saturate SATA2 Ctrl/data pin count 29 per channel No new overhead-reducing features Pwr/Media BW: 8.96mW/MBs 8 Bit HLNAND-based SSD 8 Bit NAND-based SSD
HLNAND SSD vs. NAND SSD Read Throughput, Experimental Results  HLNAND-based SSD, Media Rate NAND-based SSD, Media Rate Plots from Mobile Embedded Lab, University of Seoul, 2008
HLNAND SSD vs. SSD vs. HDD Estimates and Measured Performance Read Throughput
HLNAND SSD vs. NAND SSD Write Throughput, Experimental Results HLNAND-based SSD, Media Rate NAND-based SSD, Media Rate Plots from Mobile Embedded Lab, University of Seoul, 2008
HLNAND SSD vs. SSD vs. HDD Estimates and Measured Performance Write Throughput
SSD Controller Design: Flash Translation Layer, FTL FTL responsible for high cost jobs of wear-leveling and garbage collection Wear leveling operations include Merge Switch Switch after copy Copy after PPE (Page-Pair Erase) Migration
HLAND Improves FTL Performance New HLNAND features reduce block recycling costs Page-Pair Erase Random page program Partial block erase Multi block erase
Wear Leveling Enhancement: Copy-After-Page-Pair-Erase Page-pair erase introduces low cost wear-leveling opportunities Translates into more greater system longevity and less controller overhead 1. Page-pair erase 2. Copy Copy-after-PPE C E :  Erase cost C cp  :  Copy cost M   : Number of blocks erased concurrently N p :  Number of pages in a block k :  Number of page-copies  : Additional copy overhead Blk 0 1 2 3 4 Log 0 1 2 4 4
Synthetic Workload Experiment X8b 4KB page size 256KB block size 2048 blocks/bank 1 bank Experiment performed by Mobile Embedded Lab, University of Seoul, 2008 38 82 263 617 5% 10% 25% 60% Erase Copyback Program Read
New Hierarchy for New User Experience   Historical Storage Hierarchy New Storage Hierarchy
Conclusions HLNAND’s features contribute to the acceleration of SSD satisfaction and adoption High Speed Interface Low Power Consumption High Scalability Interface Extensibility Reduced Overall Cost with  Increased Performance  Advance Core Features

Accelerating SSD Performance with HLNAND

  • 1.
    Accelerating SSD Performancewith HLNAND Roland Schuetz MOSAID Technologies Inc. David Won, INDILINX
  • 2.
    Presentation Outline PCArchitecture and the Solid State Drive HLNAND Introduction HLNAND Enhances SSD Performance Conclusions
  • 3.
    PC System ArchitectureWhere are the Bottlenecks? PC System Architecture Migration of System Memory Creating New Memory Sub-Architecture Utilize Upgraded PCIe link for New Storage Memory Sub-Architecture Most Memory Interfaces and System Interconnect Have Undergone Dramatic BW Upgrades Over the Years Bulk Storage Access BW Lags (Mechanical Latency)
  • 4.
    Three categories: EnterpriseNotebook / PC NetBook / Ultraportable Three cost models: >$100 non-NAND in Enterprise SSD’s ~$5.00 non-NAND BOM in Notebook SSD’s <$1.00 non-NAND BOM in NetBook / Ultraportable SSD’s SSD Market Segment Enterprise Notebook NetBook / Ultraportable Source: STEC Source: SanDisk Source: STEC
  • 5.
    Enterprise application movehuge amounts of data and are IO bound These include: File, web, transaction servers Multimedia editing systems Simulation servers/workstations Current Flash based enterprise applications based on conventional, 40 – 100 MBps sub-systems Enterprise Flash Storage Excellent Market for HLNAND SSDs Source: Violin 1010 Memory Appliance, Violin Memory, Inc.
  • 6.
    Current SSD Offerings(1st Generation) Leading NAND, memory module, and specialized SSD manufacturers offer conventional NAND flash based product Cost is pivotal for SSD adoption Little product and architectural differentiation 4 channel 4-way interleave Similar cost structure NAND flash constitutes the majority of the BOM Vertically integrated manufacturers have pricing advantage Similar performance 30 ~ 100MB/s Read/Write performance No one competitor has performance lock on the market
  • 7.
    How are CurrentSSDs Stacking Up? Source: Engadget
  • 8.
  • 9.
    HLNAND – NewHigh Speed NAND Standard HLNAND Unidirectional, point-to-point, daisy-chain cascade with programmable link width; 1- 8 bits Synchronous DDR signaling up to 800Mb/s/pin Each ring supports up to 255 devices with no bandwidth degradation Single CE per ring enables pin controller count reduction Low Power 1.8V I/O Conventional NAND 8 bit, bidirectional, multi-drop bus Asynchronous LVTTL signaling up to 40Mb/s/pin Speed degradation with more than 4 devices on bus Chip Enable (CE) signal required for each device Power hungry 3.3V I/O
  • 10.
    HLNAND – NewHigh Speed NAND Standard (cont’d) HLNAND New device features enhance performance and simplify controller and SSD design New, low-stress program scheme enables: Random page program Page-pair erase Multi-page & Multi-block erase Conventional NAND No new features to enhance flash performance or simplify controller and SSD design
  • 11.
    PC Demands onSystem Storage Typical PC use is IO intensive -> frequent HDD/SSD access; Ex: creating slide presentation, virus scan, etc. Booting PC is very IO intensive since the OS must load a large amount of data from bulk storage to DRAM
  • 12.
    IO Operation ExampleCan SSD Deliver “Instant Boot” Populate 1GB DRAM from Flash Based Bulk Storage Conventional Flash SSD with 60MB/s BW takes ~17sec Optimized conventional flash SSD with 95MB/s* takes ~10 sec * Mtron SSD 32 GB: Performance with a Catch Patrick Schmid, Achim Roos, November 21, 2007; SSD: Mtron MSDSATA6025032NA 10 Seconds is very observable time!
  • 13.
    IO Operation ExampleCan SSD Deliver “Instant Boot” Populate 1GB DRAM from Flash Based Bulk Storage HLNAND HL1 SSD with 266MB/s BW takes ~3.8sec HLNAND HL2 SSD with 800MB/s BW takes ~1.3sec HLNAND based SSDs offer 260% - 450% and up to 800% IO rate improvement. 3.8 sec. not instant, but much closer
  • 14.
    IOPS are HollyGrail in Enterprise Applications SSDs offer significantly higher I/O (inputs/outputs) per second Than HDDs: HLNAND will shine HDD (typical enterprise class) ~150 SSD, SATA (advertised by Sandisk) 7,000 SSD, 4G Fiber Channel (advertised by STEC) 45,000 HLNAND 300,000* Source: Sandisk, STEC, and Deutsche Bank estimates I O P S * Assume similar design to STEC SSD and multiply by media speed improvement; 45,000 * (266/40).Not including 4GFC saturation limit. Holy Grail of the Monastery of Xenophontos, Macedonian Heritage, 2000-2008
  • 15.
    SSD Performance ComparisonHLNAND-based vs. NAND-based SSD Controller for HLNAND HL1 Ring Host Interface Max. Media BW = 1066MB/s @x8b, 4 rings Total 96 signal pins excluding power: 24 signal pins per ring SSD Controller for NAND Channel 1 - Max. 25MB/s x8b Channel 2 - Max. 25MB/s x8b Channel 8 - Max. 25MB/s x8b Parallel Bus Host Interface Max. Media BW = 200MB/s @x8b, 8 channels Total 208 signal pins excluding power: 29 signal pins per channel, 8 CE# and 8 R/B# per channel 32 dies per ring 16 dies per channel, no termination 128GB SSD using 128 * 8Gb SLC Ring 1 - Max. 266MB/s @x8b Ring 1 - Max. 266MB/s @x8b
  • 16.
    HLNAND SSD vs.NAND SSD 266 MB/s per ring 1 ring to saturate SATA1 (1.5Gbs, 188MB/s) 2 rings to saturate SATA2 (3Gbps, 375M/s) Reduced pin count; 24 ctrl/data pins per ring Expanded feature set reduces controller overhead Pwr/Media BW : 1.87mW/MB/s Max. 40MB/s per channel 4-5 channels to saturate SATA1 8-10 channels to saturate SATA2 Ctrl/data pin count 29 per channel No new overhead-reducing features Pwr/Media BW: 8.96mW/MBs 8 Bit HLNAND-based SSD 8 Bit NAND-based SSD
  • 17.
    HLNAND SSD vs.NAND SSD Read Throughput, Experimental Results HLNAND-based SSD, Media Rate NAND-based SSD, Media Rate Plots from Mobile Embedded Lab, University of Seoul, 2008
  • 18.
    HLNAND SSD vs.SSD vs. HDD Estimates and Measured Performance Read Throughput
  • 19.
    HLNAND SSD vs.NAND SSD Write Throughput, Experimental Results HLNAND-based SSD, Media Rate NAND-based SSD, Media Rate Plots from Mobile Embedded Lab, University of Seoul, 2008
  • 20.
    HLNAND SSD vs.SSD vs. HDD Estimates and Measured Performance Write Throughput
  • 21.
    SSD Controller Design:Flash Translation Layer, FTL FTL responsible for high cost jobs of wear-leveling and garbage collection Wear leveling operations include Merge Switch Switch after copy Copy after PPE (Page-Pair Erase) Migration
  • 22.
    HLAND Improves FTLPerformance New HLNAND features reduce block recycling costs Page-Pair Erase Random page program Partial block erase Multi block erase
  • 23.
    Wear Leveling Enhancement:Copy-After-Page-Pair-Erase Page-pair erase introduces low cost wear-leveling opportunities Translates into more greater system longevity and less controller overhead 1. Page-pair erase 2. Copy Copy-after-PPE C E : Erase cost C cp : Copy cost M : Number of blocks erased concurrently N p : Number of pages in a block k : Number of page-copies  : Additional copy overhead Blk 0 1 2 3 4 Log 0 1 2 4 4
  • 24.
    Synthetic Workload ExperimentX8b 4KB page size 256KB block size 2048 blocks/bank 1 bank Experiment performed by Mobile Embedded Lab, University of Seoul, 2008 38 82 263 617 5% 10% 25% 60% Erase Copyback Program Read
  • 25.
    New Hierarchy forNew User Experience Historical Storage Hierarchy New Storage Hierarchy
  • 26.
    Conclusions HLNAND’s featurescontribute to the acceleration of SSD satisfaction and adoption High Speed Interface Low Power Consumption High Scalability Interface Extensibility Reduced Overall Cost with Increased Performance Advance Core Features

Editor's Notes

  • #5 August 2008 Flash Memory Summit Santa Clara, CA USA
  • #12 - Loading and saving from and to the HDD/SSD
  • #16 August 2008 Flash Memory Summit Santa Clara, CA USA
  • #17 How many conventional flash devices per channel to saturate? Sata 1 Sata 2 Sata 3
  • #18 Comparable number of interface blocks in controller Higher speed IO in HLNAND rings allows higher BW P to P connection facilitates higher IO rates per ring (channel) Lower pin count on hlnand controller -&gt; dependent on number rings, not devices