Roman Kaplan, Graduate Student,Technion

80 views

Published on

Deduplication in Resistive CAM Based SSD

Published in: Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
80
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Roman Kaplan, Graduate Student,Technion

  1. 1. May 9, 2016 1 May 9, 2016 Deduplication in Resistive CAM Based SSD Roman Kaplan, Leonid Yavits, Amir Morad, Ran Ginosar 2015
  2. 2. May 9, 2016 2 Outline 1. What is ReCAM ? 2. What is deduplication ? – How is it done today? 3. Deduplication in ReCAM – How is it simpler? 4. Simulation results
  3. 3. May 9, 2016 3 Resistive CAM – What is it? • CAM = Content Addressable Memory 1. Search for data in the entire array 2. Store address explicitly  function like RAM • Memristors:
  4. 4. May 9, 2016 4 ReCAM Crossbar
  5. 5. May 9, 2016 5 Resistive CAM – Operations What can ReCAM do: 1. Compare all its contents to a specific word 2. Write to specific columns in parallel 3. Write to specific rows in parallel
  6. 6. May 9, 2016 6 What is Deduplication? 1. Data is broken into fixed blocks 2. A fingerprint (FP) is calculated for each block
  7. 7. May 9, 2016 7 What is Deduplication? 1. Data is broken into fixed blocks 2. A fingerprint (FP) is calculated for each block 3. Identical blocks aren’t stored (deduplicated)
  8. 8. May 9, 2016 8 Deduplication Uses 1. Useful when there is repeating data – Virtual machines – WAN optimizations (networking) – Backups 2. Compression ratio depend on type of data – can reach up to 40x
  9. 9. May 9, 2016 9 Deduplication using RAM+CPU: Write 1. Calculate FP (Hash) 2. Search for it in the chunk index (takes very long time) 3. Act accordingly (next slides) Data Hash 2 … 1 … 1 PA(A) … PA(B) … PA(C) Hash(A) …. Hash(B) ….. Hash(C) Chunk Index Fingerprint Physical Address ? CNT 1 2 1 2
  10. 10. May 9, 2016 10 RAM+CPU Deduplication: Write (Case 1) Case 1: If the FP is found Data block already exists I. Add LA+PA to ATT II. Increment FP counter in chunk index 1 … 1 … 1 PA(A) … PA(B) … PA(C) Hash(A) …. Hash(B) ….. Hash(C) Chunk Index Fingerprint Physical Address CNT Hash(D) Address Decoder A B C Data Blocks Storage D PA(D) 1 Address Translation Table 𝐿𝐴(D) 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(D) PA(A) PA(B) PA(C) Logical Address Physical Address 𝐿𝐴2(D) PA(D) 2A A B B
  11. 11. May 9, 2016 11 RAM+CPU Deduplication: Write (Case 2) Case 2: If the FP is not found A unique data block I. Write block to storage II. Add LA+PA to ATT III. Add FP to chunk index 1 … 1 … 1 PA(A) … PA(B) … PA(C) Hash(A) …. Hash(B) ….. Hash(C) Chunk Index Fingerprint Physical Address CNT Hash(D) Address Decoder A B C Data Blocks Storage D PA(D) 1 A B C A C Address Translation Table 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(A) PA(B) PA(C) Logical Address Physical Address 𝐿𝐴(D) PA(D)B
  12. 12. May 9, 2016 12 Deduplication is Hard with RAM+CPU • Delete is even more complicated than write • Requires complex data structures & computations Large memory & CPU • Example: EMC XtremIO Xbrick • 5TB all-flash storage • 256GB RAM • Quad-core CPU
  13. 13. May 9, 2016 13 Deduplication in ReCAM • Much simpler than with RAM • Chunk index is not required anymore • Allows to compare all data blocks in storage simultaneously – If found, store only address-pointers Chunk Index
  14. 14. May 9, 2016 14 Deduplication in ReCAM 1. Search for new data block in the storage 2. Act accordingly (next slides) Data Hash A B C Data Blocks Storage PA(A) PA(B) PA(C) Physical Address ?
  15. 15. May 9, 2016 15 Deduplication in ReCAM Case 1: If the Data is found Data block already exists I. Add address to ATT Storage PA(A) PA(B) PA(C) Physical Address A B C Data Blocks DPA(D) Logical Address Physical Address Address Translation Table 𝐿𝐴(D) 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(D) PA(A) PA(B) PA(C) 𝐿𝐴2(D) PA(D)
  16. 16. May 9, 2016 16 Deduplication in ReCAM Case 2: If the Data is not found New Data block I. Write Data to storage II. Add address to ATT Address Translation Table Storage PA(A) PA(B) PA(C) Physical Address A B C Data Blocks DPA(D) A A B 𝐿𝐴(A) 𝐿𝐴 B 𝐿𝐴(C) PA(A) PA(B) PA(C) Logical Address Physical Address 𝐿𝐴(D) PA(D)B
  17. 17. May 9, 2016 17 Deduplication in ReCAM Much Simpler than with RAM • Write: 1. Compare the entire array data simultaneously 2. If match, save only a pointer 3. If not, save the data block + pointer • Delete isn’t more complicated than write – If no addresses pointing to the data  delete
  18. 18. May 9, 2016 18 Simulations • ReCAM – Cycle-accurate simulator: Size = 256GB, Clock = 1GHz – SPICE  each cycle power + performance • Opendedup for comparison – Intel PCM for CPU+DRAM energy – Only deduplication energy was measured – Per-block processing time for performance • 50GB of writes – Varying % of duplicate data
  19. 19. May 9, 2016 19 Simulations – ReCAM vs. OpenDedup 0 50 100 10 4 10 5 10 6 10 7 10 8 Percentage of deduplicated blocks Peakwriteperformance(IOPS) ReCAM 1KB ReCAM 2KB ReCAM 4KB ReCAM 8KB OPNDDP 1KB OPNDDP 2KB OPNDDP 4KB OPNDDP 8KB OpenDedup ReCAM Throughput vs. duplicate %
  20. 20. May 9, 2016 20 Simulations – ReCAM & OpenDedup Energy vs. duplicate % 0 20 40 60 80 100 10 3 10 4 10 5 Percentage of deduplicated blocks EnergyConsumption(Joule) ReCAM 1KB ReCAM 2KB ReCAM 4KB ReCAM 8KB OPNDDP 1KB OPNDDP 2KB OPNDDP 4KB OPNDDP 8KB
  21. 21. May 9, 2016 21 Conclusions • ReCAM has 100x higher throughput than deduplication with RAM+CPU • Energy consumption is similar or lower for the common block sizes (4 & 8KB) • Can be used as cache in hybrid storage systems • Future technology may allow for TBs of storage on a single chip
  22. 22. May 9, 2016 22 Thank you Questions ? 22

×