Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Roman Kaplan, Graduate Student,Technion

120 views

Published on

Deduplication in Resistive CAM Based SSD

Published in: Business
  • Be the first to comment

Roman Kaplan, Graduate Student,Technion

  1. 1. May 9, 2016 1 May 9, 2016 Deduplication in Resistive CAM Based SSD Roman Kaplan, Leonid Yavits, Amir Morad, Ran Ginosar 2015
  2. 2. May 9, 2016 2 Outline 1. What is ReCAM ? 2. What is deduplication ? – How is it done today? 3. Deduplication in ReCAM – How is it simpler? 4. Simulation results
  3. 3. May 9, 2016 3 Resistive CAM – What is it? • CAM = Content Addressable Memory 1. Search for data in the entire array 2. Store address explicitly  function like RAM • Memristors:
  4. 4. May 9, 2016 4 ReCAM Crossbar
  5. 5. May 9, 2016 5 Resistive CAM – Operations What can ReCAM do: 1. Compare all its contents to a specific word 2. Write to specific columns in parallel 3. Write to specific rows in parallel
  6. 6. May 9, 2016 6 What is Deduplication? 1. Data is broken into fixed blocks 2. A fingerprint (FP) is calculated for each block
  7. 7. May 9, 2016 7 What is Deduplication? 1. Data is broken into fixed blocks 2. A fingerprint (FP) is calculated for each block 3. Identical blocks aren’t stored (deduplicated)
  8. 8. May 9, 2016 8 Deduplication Uses 1. Useful when there is repeating data – Virtual machines – WAN optimizations (networking) – Backups 2. Compression ratio depend on type of data – can reach up to 40x
  9. 9. May 9, 2016 9 Deduplication using RAM+CPU: Write 1. Calculate FP (Hash) 2. Search for it in the chunk index (takes very long time) 3. Act accordingly (next slides) Data Hash 2 … 1 … 1 PA(A) … PA(B) … PA(C) Hash(A) …. Hash(B) ….. Hash(C) Chunk Index Fingerprint Physical Address ? CNT 1 2 1 2
  10. 10. May 9, 2016 10 RAM+CPU Deduplication: Write (Case 1) Case 1: If the FP is found Data block already exists I. Add LA+PA to ATT II. Increment FP counter in chunk index 1 … 1 … 1 PA(A) … PA(B) … PA(C) Hash(A) …. Hash(B) ….. Hash(C) Chunk Index Fingerprint Physical Address CNT Hash(D) Address Decoder A B C Data Blocks Storage D PA(D) 1 Address Translation Table 𝐿𝐴(D) 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(D) PA(A) PA(B) PA(C) Logical Address Physical Address 𝐿𝐴2(D) PA(D) 2A A B B
  11. 11. May 9, 2016 11 RAM+CPU Deduplication: Write (Case 2) Case 2: If the FP is not found A unique data block I. Write block to storage II. Add LA+PA to ATT III. Add FP to chunk index 1 … 1 … 1 PA(A) … PA(B) … PA(C) Hash(A) …. Hash(B) ….. Hash(C) Chunk Index Fingerprint Physical Address CNT Hash(D) Address Decoder A B C Data Blocks Storage D PA(D) 1 A B C A C Address Translation Table 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(A) PA(B) PA(C) Logical Address Physical Address 𝐿𝐴(D) PA(D)B
  12. 12. May 9, 2016 12 Deduplication is Hard with RAM+CPU • Delete is even more complicated than write • Requires complex data structures & computations Large memory & CPU • Example: EMC XtremIO Xbrick • 5TB all-flash storage • 256GB RAM • Quad-core CPU
  13. 13. May 9, 2016 13 Deduplication in ReCAM • Much simpler than with RAM • Chunk index is not required anymore • Allows to compare all data blocks in storage simultaneously – If found, store only address-pointers Chunk Index
  14. 14. May 9, 2016 14 Deduplication in ReCAM 1. Search for new data block in the storage 2. Act accordingly (next slides) Data Hash A B C Data Blocks Storage PA(A) PA(B) PA(C) Physical Address ?
  15. 15. May 9, 2016 15 Deduplication in ReCAM Case 1: If the Data is found Data block already exists I. Add address to ATT Storage PA(A) PA(B) PA(C) Physical Address A B C Data Blocks DPA(D) Logical Address Physical Address Address Translation Table 𝐿𝐴(D) 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(D) PA(A) PA(B) PA(C) 𝐿𝐴2(D) PA(D)
  16. 16. May 9, 2016 16 Deduplication in ReCAM Case 2: If the Data is not found New Data block I. Write Data to storage II. Add address to ATT Address Translation Table Storage PA(A) PA(B) PA(C) Physical Address A B C Data Blocks DPA(D) A A B 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(A) PA(B) PA(C) Logical Address Physical Address 𝐿𝐴(D) PA(D)B
  17. 17. May 9, 2016 17 Deduplication in ReCAM Much Simpler than with RAM • Write: 1. Compare the entire array data simultaneously 2. If match, save only a pointer 3. If not, save the data block + pointer • Delete isn’t more complicated than write – If no addresses pointing to the data  delete
  18. 18. May 9, 2016 18 Simulations • ReCAM – Cycle-accurate simulator: Size = 256GB, Clock = 1GHz – SPICE  each cycle power + performance • Opendedup for comparison – Intel PCM for CPU+DRAM energy – Only deduplication energy was measured – Per-block processing time for performance • 50GB of writes – Varying % of duplicate data
  19. 19. May 9, 2016 19 Simulations – ReCAM vs. OpenDedup 0 50 100 10 4 10 5 10 6 10 7 10 8 Percentage of deduplicated blocks Peakwriteperformance(IOPS) ReCAM 1KB ReCAM 2KB ReCAM 4KB ReCAM 8KB OPNDDP 1KB OPNDDP 2KB OPNDDP 4KB OPNDDP 8KB OpenDedup ReCAM Throughput vs. duplicate %
  20. 20. May 9, 2016 20 Simulations – ReCAM & OpenDedup Energy vs. duplicate % 0 20 40 60 80 100 10 3 10 4 10 5 Percentage of deduplicated blocks EnergyConsumption(Joule) ReCAM 1KB ReCAM 2KB ReCAM 4KB ReCAM 8KB OPNDDP 1KB OPNDDP 2KB OPNDDP 4KB OPNDDP 8KB
  21. 21. May 9, 2016 21 Conclusions • ReCAM has 100x higher throughput than deduplication with RAM+CPU • Energy consumption is similar or lower for the common block sizes (4 & 8KB) • Can be used as cache in hybrid storage systems • Future technology may allow for TBs of storage on a single chip
  22. 22. May 9, 2016 22 Thank you Questions ? 22

×