Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dedupe nmamit

414 views

Published on

Dedupe nmamit

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Dedupe nmamit

  1. 1. 1 Deduplication in Storage Systems Joseph Fernandes Ewen Pinto Srinivas Billava
  2. 2. 2 Who we are ? ● Joseph Fernandes (Senior Engineer, Red Hat Storage) ● Ewen Pinto (VI Sem MCA, NMAMIT, Nitte) ● Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)
  3. 3. 3 Agenda ● What is Dedupe ● Why Dedupe ● Type of Dedupe ● What is Deduped ● Where its Deduped ● When its Deduped ● Challenges in Dedupe ● Current work
  4. 4. 4 What is Deduplication? Intelligent way of storing data, by removing redundant copies of data and storing only one instance.
  5. 5. 5 What is Deduplication? ● Data units are identified by hash index ● Redundant data units replaced by pointers ● Hash algorithm with minimum collision
  6. 6. 6 Why dedupe? ● Reduces Total Cost of Ownership (TCO) ● Storage ● Network ● Used in ● Backup/Archive ● Disaster Recovery ● Replication local/remote
  7. 7. 7 What is deduped? ● File Level (Single instancing) File 1 # HASH 1 File 2
  8. 8. 8 What is deduped? ● File Level (Single instancing) File 1 # HASH 1 Pointer File 2
  9. 9. 9 What is deduped? ● File Level (Single instancing) File 1 # HASH 1# HASH 1 File 2 # HASH 2# HASH 2
  10. 10. 10 What is deduped? ● Block Level File 1 # HASH 1# HASH 1B1 B2 B3 B4 B5 B6 File 1File 1 # HASH 2# HASH 2 # HASH 3# HASH 3 # HASH 4# HASH 4 # HASH 5# HASH 5 # HASH 6# HASH 6 File 1 B1 B1 B3 B4 B4 B6 File 2File 2
  11. 11. 11 Fixed Block Chucking ● File is divided in even/equal length blocks ● Pros: Faster! ● Cons: Not space efficient!
  12. 12. 12 Fixed Block Chunking FileFile
  13. 13. 13 Variable Block Chunking ● File is chucked in variable block length ● Block size is determined by content ● Rolling Hash algorithm : Rabin Karp RHash = (p^n) * a[0] + (p^[n-1]) * a[1] + (p^[n-2]) * a[2] …..p * a[n-2] + a[n-1] If (RHash & fingerprint) == 0 { Chunk! }
  14. 14. 14 Variable Block Chunking FileFile
  15. 15. 15 Variable Block Chucking ● Pros: Space Efficiency! ● Cons: Slower !
  16. 16. 16 Where its Deduped? ● Client Side ● Pros: Less network traffic ● Cros: Heavier Clients ● CPU/Memory ● Metadata storage
  17. 17. 17 Where its Deduped? ● Server Side ● Pros: Lighter Clients ● Cons: more network traffic
  18. 18. 18 When its Deduped? ● Inline Deduped ● Offline Deduped
  19. 19. 19 Challenges in Dedupe ● Single point of failure “Last line of defense! Or fall off the cliff!” ● Performance ● Distributed Dedupe
  20. 20. 20 Current Work: YADL ● “Yet Another Dedupe Library” ● Stream based user space dedupe library ● File or Object or Block ● The Future : YADL-E
  21. 21. 21 Current Work: YADL ● https://github.com/YADL/yadl ● Contributors: ● Ewen Pinto (ewenpin@gmail.com) ● Srinivas B (srinivasbillav@gmail.com) ● Karthik US (kus.karthikus9@gmail.com) ● Sukumar Poojary (sukumarpoojari92@gmail.com)
  22. 22. 22 THANK YOU

×