Dedupe nmamit

•Download as ODP, PDF•

1 like•917 views

Gluster.org

Dedupe nmamit

Technology

1
Deduplication in Storage
Systems
Joseph Fernandes
Ewen Pinto
Srinivas Billava

2
Who we are ?
● Joseph Fernandes (Senior Engineer, Red Hat Storage)
● Ewen Pinto (VI Sem MCA, NMAMIT, Nitte)
● Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)

3
Agenda
● What is Dedupe
● Why Dedupe
● Type of Dedupe
● What is Deduped
● Where its Deduped
● When its Deduped
● Challenges in Dedupe
● Current work

4
What is Deduplication?
Intelligent way of storing data, by removing redundant
copies of data and storing only one instance.

5
What is Deduplication?
● Data units are identified by hash index
● Redundant data units replaced by pointers
● Hash algorithm with minimum collision

6
Why dedupe?
● Reduces Total Cost of Ownership (TCO)
● Storage
● Network
● Used in
● Backup/Archive
● Disaster Recovery
● Replication local/remote

7
What is deduped?
● File Level (Single instancing)
File 1
# HASH 1
File 2

8
What is deduped?
● File Level (Single instancing)
File 1 # HASH 1
Pointer File 2

9
What is deduped?
● File Level (Single instancing)
File 1 # HASH 1# HASH 1
File 2 # HASH 2# HASH 2

10
What is deduped?
● Block Level
File 1
# HASH 1# HASH 1B1
B2
B3
B4
B5
B6
File 1File 1
# HASH 2# HASH 2
# HASH 3# HASH 3
# HASH 4# HASH 4
# HASH 5# HASH 5
# HASH 6# HASH 6
File 1
B1
B1
B3
B4
B4
B6
File 2File 2

11
Fixed Block Chucking
● File is divided in even/equal length blocks
● Pros: Faster!
● Cons: Not space efficient!

$13 Variable Block Chunking ● File is chucked in variable block length ● Block size is determined by content ● Rolling Hash algorithm : Rabin Karp RHash = (p^n) * a[0] + (p^[n-1]) * a[1] + (p^[n-2]) * a[2] …..p * a[n-2] + a[n-1] If (RHash & fingerprint) == 0 { Chunk! }$

15
Variable Block Chucking
● Pros: Space Efficiency!
● Cons: Slower !

16
Where its Deduped?
● Client Side
● Pros: Less network traffic
● Cros: Heavier Clients
● CPU/Memory
● Metadata storage

17
Where its Deduped?
● Server Side
● Pros: Lighter Clients
● Cons: more network traffic

18
When its Deduped?
● Inline Deduped
● Offline Deduped

19
Challenges in Dedupe
● Single point of failure
“Last line of defense! Or fall off the cliff!”
● Performance
● Distributed Dedupe

20
Current Work: YADL
● “Yet Another Dedupe Library”
● Stream based user space dedupe library
● File or Object or Block
● The Future : YADL-E

21
Current Work: YADL
● https://github.com/YADL/yadl
● Contributors:
● Ewen Pinto (ewenpin@gmail.com)
● Srinivas B (srinivasbillav@gmail.com)
● Karthik US (kus.karthikus9@gmail.com)
● Sukumar Poojary (sukumarpoojari92@gmail.com)

What's hot

Lisa 2015-gluster fs-introductionGluster.org

Sdc 2012-challengesGluster.org

GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org

Disperse xlator ramon_datalabGluster.org

Sdc challenges-2012Gluster.org

Gluster overview & future directions vault 2015Vijay Bellur

Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org

Gluster d2Gluster.org

Red Hat Gluster Storage : GlusterFSbipin kunal

OSBConf 2015 | Scale out backups with bareos and gluster by niels de vosNETWAYS

Lcna tutorial-2012Gluster.org

20160130 Gluster-roadmapGluster.org

Smb gluster devmar2013Gluster.org

Gluster fs current_features_and_roadmapGluster.org

Software defined storageGluster.org

Developing apps and_integrating_with_gluster_fs_-_libgfapiGluster.org

20160401 Gluster-roadmapGluster.org

Debugging with-wireshark-niels-de-vosGluster.org

Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Gluster.org

20160401 guster-roadmapGluster.org

What's hot (20)

Lisa 2015-gluster fs-introduction

Sdc 2012-challenges

GlusterFs Architecture & Roadmap - LinuxCon EU 2013

Disperse xlator ramon_datalab

Sdc challenges-2012

Gluster overview & future directions vault 2015

Challenges with Gluster and Persistent Memory with Dan Lambright

Gluster d2

Red Hat Gluster Storage : GlusterFS

OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos

Lcna tutorial-2012

20160130 Gluster-roadmap

Smb gluster devmar2013

Gluster fs current_features_and_roadmap

Software defined storage

Developing apps and_integrating_with_gluster_fs_-_libgfapi

20160401 Gluster-roadmap

Debugging with-wireshark-niels-de-vos

Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013

20160401 guster-roadmap

Viewers also liked

Gluster technical overviewGluster.org

レッドハットグラスターストレージ Red Hat Gluster Storage (Japanese)Katsutoshi Kojima

Gdeploy 2.0Sachidananda Urs

GlusterFS ContainersMohamed Ashiq

Join the super_colony_-_feb2013Gluster.org

Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Gluster.org

Gsummit apis-2013Gluster.org

Introduction to Open SourceGluster.org

QosGluster.org

Gluster fs current_features_and_roadmapGluster.org

Glusterfs for sysadmins-justin_cliftGluster.org

Kkeithley ufonfs-gluster summitGluster.org

Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org

Implementing PaaS with Red Hat OpenShift - review, reference and conceptsorenre

Leases and-caching finalGluster.org

Lcna example-2012Gluster.org

On demand file-caching_-_gustavo_brandGluster.org

Accessing gluster ufo_-_eco_willsonGluster.org

Viewers also liked (18)

Gluster technical overview

レッドハットグラスターストレージ Red Hat Gluster Storage (Japanese)

Gdeploy 2.0

GlusterFS Containers

Join the super_colony_-_feb2013

Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013

Gsummit apis-2013

Introduction to Open Source

Qos

Gluster fs current_features_and_roadmap

Glusterfs for sysadmins-justin_clift

Kkeithley ufonfs-gluster summit

Dustin Black - Red Hat Storage Server Administration Deep Dive

Implementing PaaS with Red Hat OpenShift - review, reference and concepts

Leases and-caching final

Lcna example-2012

On demand file-caching_-_gustavo_brand

Accessing gluster ufo_-_eco_willson

Similar to Dedupe nmamit

DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue

Efficient data maintaince in GlusterFS using DatabasesJoseph Elwin Fernandes

Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.

Caching in (DevoxxUK 2013)RichardWarburton

Hadoop-2.6.0 Slideskul prasad subedi

Large scale data management in Chipster workflow environmentaleksi_kallio

Integrity and Security in FilesystemsConferencias FIST

LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro

Case study of BtrFS: A fault tolerant File systemKumar Amit Mehta

The Parquet Format and Performance Optimization OpportunitiesDatabricks

Advanced Administration, Monitoring and BackupMongoDB

Caching inRichardWarburton

Performance Tuning in HDF5 The HDF-EOS Tools and Information Center

Netflix Open Source Meetup Season 4 Episode 2aspyker

Storage talkchristkv

Wheeler w 0450_linux_file_systems1sprdd

Red Hat Storage Server Administration Deep DiveRed_Hat_Storage

Performance and predictabilityRichardWarburton

Taking Splunk to the Next Level - Architecture Breakout SessionSplunk

Similar to Dedupe nmamit (20)

DRBD Deep Dive - Philipp Reisner - LINBIT

Efficient data maintaince in GlusterFS using Databases

Enabling Presto Caching at Uber with Alluxio

Caching in (DevoxxUK 2013)

Hadoop-2.6.0 Slides

Large scale data management in Chipster workflow environment

Integrity and Security in Filesystems

LAS16-400: Mini Conference 3 AOSP (Session 1)

Case study of BtrFS: A fault tolerant File system

The Parquet Format and Performance Optimization Opportunities

Advanced Administration, Monitoring and Backup

Caching in

Performance Tuning in HDF5

Netflix Open Source Meetup Season 4 Episode 2

Storage talk

Wheeler w 0450_linux_file_systems1

Red Hat Storage Server Administration Deep Dive

Performance and predictability

Taking Splunk to the Next Level - Architecture Breakout Session

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Key Features Of Token Development (1).pptxLBM Solutions

Build your next Gen AI Breakthrough - April 2024Neo4j

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

APIForce Zurich 5 April Automation LPDGMarianaLemus7

Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Understanding the Laravel MVC ArchitecturePixlogix Infotech

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Are Multi-Cloud and Serverless Good or Bad?

Connect Wave/ connectwave Pitch Deck Presentation

Key Features Of Token Development (1).pptx

Build your next Gen AI Breakthrough - April 2024

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

SQL Database Design For Developers at php[tek] 2024

My Hashitalk Indonesia April 2024 Presentation

APIForce Zurich 5 April Automation LPDG

Unlocking the Potential of the Cloud for IBM Power Systems

Streamlining Python Development: A Guide to a Modern Project Setup

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Understanding the Laravel MVC Architecture

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Unleash Your Potential - Namagunga Girls Coding Club

Dedupe nmamit

1. 1 Deduplication in Storage Systems Joseph Fernandes Ewen Pinto Srinivas Billava

2. 2 Who we are ? ● Joseph Fernandes (Senior Engineer, Red Hat Storage) ● Ewen Pinto (VI Sem MCA, NMAMIT, Nitte) ● Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)

3. 3 Agenda ● What is Dedupe ● Why Dedupe ● Type of Dedupe ● What is Deduped ● Where its Deduped ● When its Deduped ● Challenges in Dedupe ● Current work

4. 4 What is Deduplication? Intelligent way of storing data, by removing redundant copies of data and storing only one instance.

5. 5 What is Deduplication? ● Data units are identified by hash index ● Redundant data units replaced by pointers ● Hash algorithm with minimum collision

6. 6 Why dedupe? ● Reduces Total Cost of Ownership (TCO) ● Storage ● Network ● Used in ● Backup/Archive ● Disaster Recovery ● Replication local/remote

7. 7 What is deduped? ● File Level (Single instancing) File 1 # HASH 1 File 2

8. 8 What is deduped? ● File Level (Single instancing) File 1 # HASH 1 Pointer File 2

9. 9 What is deduped? ● File Level (Single instancing) File 1 # HASH 1# HASH 1 File 2 # HASH 2# HASH 2

10. 10 What is deduped? ● Block Level File 1 # HASH 1# HASH 1B1 B2 B3 B4 B5 B6 File 1File 1 # HASH 2# HASH 2 # HASH 3# HASH 3 # HASH 4# HASH 4 # HASH 5# HASH 5 # HASH 6# HASH 6 File 1 B1 B1 B3 B4 B4 B6 File 2File 2

11. 11 Fixed Block Chucking ● File is divided in even/equal length blocks ● Pros: Faster! ● Cons: Not space efficient!

12. 12 Fixed Block Chunking FileFile

13. 13 Variable Block Chunking ● File is chucked in variable block length ● Block size is determined by content ● Rolling Hash algorithm : Rabin Karp RHash = (p^n) * a[0] + (p^[n-1]) * a[1] + (p^[n-2]) * a[2] …..p * a[n-2] + a[n-1] If (RHash & fingerprint) == 0 { Chunk! }

14. 14 Variable Block Chunking FileFile

15. 15 Variable Block Chucking ● Pros: Space Efficiency! ● Cons: Slower !

16. 16 Where its Deduped? ● Client Side ● Pros: Less network traffic ● Cros: Heavier Clients ● CPU/Memory ● Metadata storage

17. 17 Where its Deduped? ● Server Side ● Pros: Lighter Clients ● Cons: more network traffic

18. 18 When its Deduped? ● Inline Deduped ● Offline Deduped

19. 19 Challenges in Dedupe ● Single point of failure “Last line of defense! Or fall off the cliff!” ● Performance ● Distributed Dedupe

20. 20 Current Work: YADL ● “Yet Another Dedupe Library” ● Stream based user space dedupe library ● File or Object or Block ● The Future : YADL-E

21. 21 Current Work: YADL ● https://github.com/YADL/yadl ● Contributors: ● Ewen Pinto (ewenpin@gmail.com) ● Srinivas B (srinivasbillav@gmail.com) ● Karthik US (kus.karthikus9@gmail.com) ● Sukumar Poojary (sukumarpoojari92@gmail.com)

22. 22 THANK YOU

Editor's Notes

Search should be precise and fast Should have rich metadata filter : Modification Frequency, IO Sizes etc Should deal with distributed nature of data Should do load balancing

Dedupe nmamit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Dedupe nmamit

Similar to Dedupe nmamit (20)

More from Gluster.org

More from Gluster.org (20)

Recently uploaded

Recently uploaded (20)

Dedupe nmamit

Editor's Notes