A Prototype Storage Subsystem based on Phase Change Memory

9,987 views

Published on

IBM scientists for the first time demonstrated a hybrid storage and caching subsystem, code-named Project Theseus, at the recent 2014 Non-Volatile Memories Workshop in San Francisco, California. And the amazing achievement is that they were using two year old PCM chip prototypes.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
9,987
On SlideShare
0
From Embeds
0
Number of Embeds
3,500
Actions
Shares
0
Downloads
118
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • PCM is emerging as one of the most promising NVM technologies and is becoming more and more matureThe active material in PCM is a chalcogenidic alloy which is placed between two electrically conducting electrodes in each cell of the memory array.A low current is then applied that causes a phase transition.
  • We focus on a PCM-based system that uses PCM in a persistent storage device attached on the I/O bus of the system
  • One of the main concerns regarding the use of PCM for high-performance applications is its relatively high write latency. The purpose of this work has been to architect a PCM-based device and implement a fully functional, high-performance PCI-e card that takes advantage of the characteristics of PCM and mitigates its limitations. \The design of PSS targets storage workloads that are dominated by small (e.g., 4 kB) random I/O operations and aims to achieve low, predictable latency with reads and writes
  • Was commercially available when the project started (end of 2012)
  • A Prototype Storage Subsystem based on Phase Change Memory

    1. 1. © 2013 IBM Corporation PSS A Prototype Storage Subsystem based on PCM IBM Research – Zurich Ioannis Koltsidas, Roman Pletka, Peter Mueller, Thomas Weigold, Evangelos Eleftheriou University of Patras Maria Varsamou, Athina Ntalla, Elina Bougioukou, Aspasia Palli, Theodore Antonakopoulos
    2. 2. © 2013 IBM Corporation Phase Change Memory (PCM)  Based on the thermal threshold switching effect of chalcogenidic meterials  Two Phases: 2 Set Reset Amorphous Phase Crystalline Phase  Phases have very different electrical resistances (ratio of 1:100 to 1:1000)  Transition between phases by controlled heating and cooling  Read time: 100-300 nsec  Program time: 10-150 μsec  PCM cells can be reprogrammed at least 106 times  Performance and price characteristics between DRAM and Flash
    3. 3. © 2013 IBM Corporation Placing PCM in Servers and Storage Systems Server System Storage System RAID ctrl HBA CPUs DRAM Cache I/O bus RAID ctrl CPUs I/O bus HA PCM PCM Hybrid PCIe attached SSD Hybrid SAS/SATA attached SSD PCM Hybrid PCIe attached SSD Hybrid SAS/SATA attached SSD PCM DRAM DRAM PCM DRAM Cache PCMPCM PCM PCM 3 PCMPCM HDDSSD All-Flash Array Metadata Tables
    4. 4. © 2013 IBM Corporation PSS: A PCM-based PCI-e Prototype Card 4 Goal: Architect a PCM-based device and implement a fully-functional, high-performance PCI-e card  Take advantage of the characteristics of PCM and mitigate its limitations  Target is workloads dominated by 4kB requests  Simple, lightweight hardware design  System integration of multiple cards through software  Emphasis on consistently low, predictable latency PCM PCI-e cardHost PCM chips PCM Channels Lean PCM Controller Multi-lane PCI-e bus Block Device Driver PCM pipe #1 PCM pipe #n 512B 4kB  Limited in capacity due to the density of commercially available PCM parts (as of early 2013)  Use cases: – Caching device – Metadata store – Backend for low-latency Key-Value store – Tiered storage device in a hybrid configuration with Flash
    5. 5. © 2013 IBM Corporation PCM Parts: Micron P5Q  90nm technology node  128 Mbit devices (NP5Q128AE3ESFC0E)  SPI bus compatible serial interface  Maximum clock frequency: 66 MHz  64-byte write buffer – 120 μsec average program time – about 0.5 MB/s write bandwidth 5 Block transfer time: 8.24 usecs (64+4 Bytes) Sector I/O (512B + 64B): Write: 1.15 msecs (0.86 kIOPs) Read: 75.24 usecs (13.29 kIOPs) Very asymmetric read / write performance
    6. 6. © 2013 IBM Corporation 2D PCM Channel Architecture 6 PCM (1,NI) PCM (1, NI -1) PCM (1,1) k PCM (2, NI) PCM (2, NI -1) PCM (2,1) k PCM (NS, NI) PCM (NS, NI -1) PCM (NS,1) k FSM Data Buffer FSM FSMFSM Sub-channel PCM Channel Controller Sub-bank
    7. 7. © 2013 IBM Corporation Read vs. Write Performance Trade-Offs  A high degree of pipelining: – Increases the write performance • By having the long programming times overlap – May reduce the read performance • Read times are anyway very short • Less parallelism due to fewer I/O pins  For a given budget of I/O pins: – More sub-channels  Better read performance – More sub-banks  Better write performance  Application needs should drive the configuration – Channels with different geometry in the same device possible  We chose a configuration that minimizes write latency without severely penalizing reads 7
    8. 8. © 2013 IBM Corporation 3x3 Channel Architecture 8 PCM (1,3) PCM (1, 2) PCM (1,1) k PCM (2, 3) PCM (2, 2) PCM (2,1) k PCM (3, 3) PCM (3, 2) PCM (3,1) k FSM Data Buffer FSM FSMFSM PCM Channel Controller 1 Block = 64 bytes 3 blocks 3 blocks 3 blocks For each user sector (512b= 8x64), we store 64bytes of metadata, i.e., 9 blocks in total P C M B a n k
    9. 9. © 2013 IBM Corporation PSS Channel Card One PCM channel per side Two PCM channels per card Channel card CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout [1,1] [1,3] [3,1] [3,3] [1,2] [2,1] [2,3] [2,2] [3,2] CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout CS/ Din CLK Dout [1,1] [1,3] [3,1] [3,3] [1,2] [2,1] [2,3] [2,2] [3,2] CS[3] CS[2] CS[1] CLK D1[1:0] D2[1:0] D3[1:0]DIR C B A Y0 Y1 Y2Y3Y4Y5 3 to 8 DECODER PCM channel specs Data transfer Rate: 49.5 MBps Sector read time: 13.8 usecs Sector read rate: 61.6 ksectors/sec Sector write time: 133.8 usecs Sector write rate: 14.8 ksectors/sec 1 PCM Channel = 2 Banks (2x3x3)
    10. 10. © 2013 IBM Corporation PSS PCI-e Card 10 PCMChannelmodule Xilinx Zynq-7045 FPGA Board 2XHost-attachedPSSCards  Error correction based on simple BCH codes o 6 BCH codewords per 512 bytes sector with 4 bits error correction capability per codeword.  Wear leveling using a Start-Gap scheme  512MB of DRAM, mostly used as a write cache  Support for cached writes, direct writes with early completion, direct writes with late completion 8 PCM channels with pipeline support per card
    11. 11. © 2013 IBM Corporation PSS Experimental Results Throughput versus Offered Load (4kB pages)
    12. 12. © 2013 IBM Corporation Random Read PSS Experimental Results I/O Completion Latency Distribution PCM technology programming timeRandom Write (cached)
    13. 13. © 2013 IBM Corporation Latency distribution comparison to Flash-based devices  Devices – PSS PCI-e Card – MLC Flash PCI-e SSD 1 – MLC Flash PCI-e SSD 2 – TLC Flash SATA SSD  Experiment Per-I/O latency measurements for 2 hours of uniformly random 4kB writes at QD=1 (after 12 hours of preconditioning with the same workload) 13
    14. 14. © 2013 IBM Corporation Latency Profile up to 1msec PSS MLC1 MLC2 TLC
    15. 15. © 2013 IBM Corporation Latency Profile up to 10msec PSS MLC1 MLC2 TLC
    16. 16. © 2013 IBM Corporation Total Latency Profile PSS MLC1 MLC2 TLC
    17. 17. © 2013 IBM Corporation PCM Endurance Measurements 17 minimum write time per sector maximum write time per sector mean write time per sector The effect of PCM aging on write time and BER Experimental parameters: • Random data • 32 sectors per write cycle • 4 PCM channels • 8 PCM banks • Pipeline is active Experiment - Perform 10K write cycles with random data (x32 sectors) - Write, read and compare a set of 32 sectors (single write cycle)
    18. 18. PCM Write Latency Distribution Experimental parameters: • Random data • 32 sectors per write cycle • 4 PCM channels • 8 PCM banks/controllers • Pipeline is active Experiment - Perform 10K write cycles with random data (x32 sectors) - Write, read and compare a set of 32 sectors (single write cycle)
    19. 19. © 2013 IBM Corporation Conclusions  PCM is a promising new memory technology  PSS is a PCI-e attached subsystem that mitigates the limitations of current PCM technology  The 2D Channel Architecture allows the designer to trade-off read performance for write performance and vice-versa  PSS achieved good performace – 65k Read IOPS @ 35 μsec – 15k Write IOPS @ 61 μsec  PSS achieved consistently low write latency – 99.9% of the requests completed within 240 μsec • 12x and 275x lower than MLC and TLC Flash SSDs, respectively – Highest observed latency was 2 msec • 7x and 61x lower than MLC and TLC Flash SSDs, respectively 19
    20. 20. © 2013 IBM Corporation20

    ×