SlideShare a Scribd company logo
1 of 17
DATA DEDUPLICATION IN CLOUD
STORAGE
Presented by
M Praveen Kumar
What is a cloud storage?
• Cloud computing is an emerging computing paradigm in
which resources of the computing infrastructure are
Provided as services over the Internet.
Basic characteristics of cloud:
• On-demand self-service
• Broad network access
• Resource pooling
• Measured service
• Rapid elasticity
What is Deduplication ?
 Deduplication: It is a technique which eliminates redundant
data by storing only a single copy of each file or block
 It reduces the space and the bandwidth requirements of data
storage services like cloud
 It provides major savings in backup environments(saves more
than 90% in common business scenarios)
 It is the most impactful storage technology
• In April 2008,IBM acquired Diligent
• In July 2009,EMC acquired Data Domain
• In July 2010,Dell acquired Ocarina
How are files deduped ?
• Fingerprint each file using a hash function
– Common hashes used: Sha1, Sha256, others…
– Store an index of all the hashes already in the system
• New file:
– Compute hash
– Look hash up in index table
– If new → add to index
– If known hash → store as pointer to existing data
Deduplication strategies
 There are two main deduplication strategies
file level deduplication-where only a single copy of file is stored
based on the hash value
Block level deduplication-where each file is categorized into
blocks and stores only a single copy of multiple identical block
It can have fixed-size or variable-size blocks
This particularly includes the following
• Location: Deduplication can be performed at dierent
locations. Depending on the participating machines
and steps in the customized deduplication process,it
is either performed on the client machine (source-
side) or near the final data store (target-side).
• Since that conserves network bandwidth,
• Time: Depending on the use case, deduplication is
either performed as the data is transferred from the
source to the target (in-band or in-line) or
asynchronously in well-dened intervals (out-of-band
or post-process).
• depending on the amount of data, this approach can
cause a bottleneck in terms of throughput on the
server side.
Applications
Due to its storage reducing nature,it is widely
used in
• Backup systems
• Disaster recovery
Deduplication methods
• Deduplication greatly differs in how the redundant
data is identified
• Depending on the requirements of the application
and the characteristics of data,they are categorised
as
-single instance storage
-fixed size chunking
-variable size chunking
-file type aware chunking
Single instance storage:
It does not break the files into smaller chunks but
rather uses entire file as chunks
This method only eliminates duplicate files and does
not detect if the files are altered in just ew bytes
Advantages:
Indexing performance
Low cpu usage
Disadv:
Cannot be applied for the large files with changin data
Fixed size chunking
Instead of using entire file as smallest unit It breaks the
files into equally sized chunks
If a large file is changed only the changed chunks must
be re-indexed and transferred to the backup location
Disadv:
It fails to detect redundant data if some bytes are
inserted or deleted from the file because chunk
boundaries are determined by offset rather than by
content
• Variable sized chunking
it defines breakpoints where a certain condition
becomes true
This is usually done by fixed size overlapping sliding
window
At every offset of a file,the contents of the sliding
window are analyzed and a finger print f is
calculed
If f satisfies the break condition,a new break point
has been found and new chunk is created
File type aware chunking
the best redundancy detection can be achieved
if the data stream is understood by the
chunking method
Adv:
Breakpoints are more natural
Better space savings
Futue scope
• The solutions described above provide security in different
ways but none of the solutions cover all the vulnerability. In
addition to that not even a single method has been proposed
which can establish the trust between the user and the cloud
service provider. The client is never assured that his files are
known only to him and to no one else including the SSP.
Therefore we propose a solution whose main aim is to
establish the trust between the user and SSP. This method
involves injecting an application on client side LAN. The
abstract view of the setup is shown in figure 1. The various
modules and the working of the system are described below:
CONCLUSION
• THANK YOU

More Related Content

What's hot

A hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationA hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationPvrtechnologies Nellore
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
 
Secure deduplicaton with efficient and reliable convergent
Secure deduplicaton with  efficient and reliable   convergentSecure deduplicaton with  efficient and reliable   convergent
Secure deduplicaton with efficient and reliable convergentJayakrishnan U
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationSWAMI06
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationPrem Rao
 
Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudPvrtechnologies Nellore
 
OSCON 15 Building Opensource wtih Open Source
OSCON 15 Building Opensource wtih Open SourceOSCON 15 Building Opensource wtih Open Source
OSCON 15 Building Opensource wtih Open SourceSusan Wu
 
a hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsa hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsswathi78
 
Provable multicopy dynamic data possession in cloud computing systems
Provable multicopy dynamic data possession in cloud computing systemsProvable multicopy dynamic data possession in cloud computing systems
Provable multicopy dynamic data possession in cloud computing systemsPvrtechnologies Nellore
 
Identity based distributed provable data possession in multi-cloud storage
Identity based distributed provable data possession in multi-cloud storageIdentity based distributed provable data possession in multi-cloud storage
Identity based distributed provable data possession in multi-cloud storagePapitha Velumani
 
A hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationA hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationTmks Infotech
 
Introducing Lattus Object Storage
Introducing Lattus Object StorageIntroducing Lattus Object Storage
Introducing Lattus Object StorageQuantum
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication1crore projects
 
JPD1406 Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...
JPD1406  Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...JPD1406  Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...
JPD1406 Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...chennaijp
 
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSPROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSNexgen Technology
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsLars Nielsen
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionErik Krogen
 
Network Attached Storage (NAS)
Network Attached Storage (NAS)Network Attached Storage (NAS)
Network Attached Storage (NAS)sandeepgodfather
 

What's hot (19)

A hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationA hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplication
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
Secure deduplicaton with efficient and reliable convergent
Secure deduplicaton with  efficient and reliable   convergentSecure deduplicaton with  efficient and reliable   convergent
Secure deduplicaton with efficient and reliable convergent
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized Deduplication
 
Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloud
 
OSCON 15 Building Opensource wtih Open Source
OSCON 15 Building Opensource wtih Open SourceOSCON 15 Building Opensource wtih Open Source
OSCON 15 Building Opensource wtih Open Source
 
a hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsa hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplications
 
Provable multicopy dynamic data possession in cloud computing systems
Provable multicopy dynamic data possession in cloud computing systemsProvable multicopy dynamic data possession in cloud computing systems
Provable multicopy dynamic data possession in cloud computing systems
 
Identity based distributed provable data possession in multi-cloud storage
Identity based distributed provable data possession in multi-cloud storageIdentity based distributed provable data possession in multi-cloud storage
Identity based distributed provable data possession in multi-cloud storage
 
A hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationA hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplication
 
Introducing Lattus Object Storage
Introducing Lattus Object StorageIntroducing Lattus Object Storage
Introducing Lattus Object Storage
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
 
JPD1406 Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...
JPD1406  Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...JPD1406  Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...
JPD1406 Enabling Data Integrity Protection in Regenerating-Coding-Based Clou...
 
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSPROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
Abstract
AbstractAbstract
Abstract
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop Encryption
 
Network Attached Storage (NAS)
Network Attached Storage (NAS)Network Attached Storage (NAS)
Network Attached Storage (NAS)
 

Viewers also liked

Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication conceptsSaroj Sahu
 
Secure Authorised De-duplication using Convergent Encryption Technique
Secure Authorised De-duplication using Convergent Encryption TechniqueSecure Authorised De-duplication using Convergent Encryption Technique
Secure Authorised De-duplication using Convergent Encryption TechniqueEswar Publications
 
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and DeduplicationSecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and DeduplicationIJCERT
 
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud StoragePasquale Puzio
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsShivansh Gaur
 
2015 cloud sim projects
2015 cloud sim projects2015 cloud sim projects
2015 cloud sim projectsHari Krishnan
 
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01shobhiya kumar
 
The SHA Hashing Algorithm
The SHA Hashing AlgorithmThe SHA Hashing Algorithm
The SHA Hashing AlgorithmBob Landstrom
 

Viewers also liked (12)

Deduplication
DeduplicationDeduplication
Deduplication
 
Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication concepts
 
Secure Authorised De-duplication using Convergent Encryption Technique
Secure Authorised De-duplication using Convergent Encryption TechniqueSecure Authorised De-duplication using Convergent Encryption Technique
Secure Authorised De-duplication using Convergent Encryption Technique
 
Avamar presales 1.0
Avamar presales 1.0Avamar presales 1.0
Avamar presales 1.0
 
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and DeduplicationSecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
 
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
 
Keccak
KeccakKeccak
Keccak
 
2015 cloud sim projects
2015 cloud sim projects2015 cloud sim projects
2015 cloud sim projects
 
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
 
Fungsi Hash & Algoritma SHA-256
Fungsi Hash & Algoritma SHA-256Fungsi Hash & Algoritma SHA-256
Fungsi Hash & Algoritma SHA-256
 
The SHA Hashing Algorithm
The SHA Hashing AlgorithmThe SHA Hashing Algorithm
The SHA Hashing Algorithm
 

Similar to Deduplication in Open Spurce Cloud

Survey on cloud backup services of personal storage
Survey on cloud backup services of personal storageSurvey on cloud backup services of personal storage
Survey on cloud backup services of personal storageeSAT Journals
 
Attribute Based Storage Supporting Secure Deduplication of Encrypted D...
 Attribute Based Storage Supporting Secure    Deduplication  of  Encrypted  D... Attribute Based Storage Supporting Secure    Deduplication  of  Encrypted  D...
Attribute Based Storage Supporting Secure Deduplication of Encrypted D...Prasadu Peddi
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Presentation (6).pptx
Presentation (6).pptxPresentation (6).pptx
Presentation (6).pptxMSMuthu5
 
Deduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSDeduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSIRJET Journal
 
Cloud-Storage-PPT - Copy.pptx
Cloud-Storage-PPT - Copy.pptxCloud-Storage-PPT - Copy.pptx
Cloud-Storage-PPT - Copy.pptxDeveshKhandare
 
Provable multicopy dynamic data possession
Provable multicopy dynamic data possessionProvable multicopy dynamic data possession
Provable multicopy dynamic data possessionnexgentech15
 
Provable multicopy dynamic data possession
Provable multicopy dynamic data possessionProvable multicopy dynamic data possession
Provable multicopy dynamic data possessionnexgentechnology
 
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSPROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSNexgen Technology
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerIOSR Journals
 
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage DelugeWebinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage DelugeStorage Switzerland
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET Journal
 
Cloud computing(Basic).pptx
Cloud computing(Basic).pptxCloud computing(Basic).pptx
Cloud computing(Basic).pptxnischal52
 
IRJET- Cloud based Deduplication using Middleware Approach
IRJET- Cloud based Deduplication using Middleware ApproachIRJET- Cloud based Deduplication using Middleware Approach
IRJET- Cloud based Deduplication using Middleware ApproachIRJET Journal
 

Similar to Deduplication in Open Spurce Cloud (20)

Cloud slide
Cloud slideCloud slide
Cloud slide
 
Survey on cloud backup services of personal storage
Survey on cloud backup services of personal storageSurvey on cloud backup services of personal storage
Survey on cloud backup services of personal storage
 
Attribute Based Storage Supporting Secure Deduplication of Encrypted D...
 Attribute Based Storage Supporting Secure    Deduplication  of  Encrypted  D... Attribute Based Storage Supporting Secure    Deduplication  of  Encrypted  D...
Attribute Based Storage Supporting Secure Deduplication of Encrypted D...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Presentation (6).pptx
Presentation (6).pptxPresentation (6).pptx
Presentation (6).pptx
 
Deduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSDeduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFS
 
Cloud-Storage-PPT - Copy.pptx
Cloud-Storage-PPT - Copy.pptxCloud-Storage-PPT - Copy.pptx
Cloud-Storage-PPT - Copy.pptx
 
Provable multicopy dynamic data possession
Provable multicopy dynamic data possessionProvable multicopy dynamic data possession
Provable multicopy dynamic data possession
 
Provable multicopy dynamic data possession
Provable multicopy dynamic data possessionProvable multicopy dynamic data possession
Provable multicopy dynamic data possession
 
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSPROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
 
Mis cloud computing
Mis cloud computingMis cloud computing
Mis cloud computing
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud Server
 
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
 
storage.pptx
storage.pptxstorage.pptx
storage.pptx
 
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage DelugeWebinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
Cloud computing(Basic).pptx
Cloud computing(Basic).pptxCloud computing(Basic).pptx
Cloud computing(Basic).pptx
 
E045026031
E045026031E045026031
E045026031
 
IRJET- Cloud based Deduplication using Middleware Approach
IRJET- Cloud based Deduplication using Middleware ApproachIRJET- Cloud based Deduplication using Middleware Approach
IRJET- Cloud based Deduplication using Middleware Approach
 
[IJET-V2I2P9] Authors:Reshma A. Hegde1, Madhura Prakash
[IJET-V2I2P9] Authors:Reshma A. Hegde1, Madhura Prakash[IJET-V2I2P9] Authors:Reshma A. Hegde1, Madhura Prakash
[IJET-V2I2P9] Authors:Reshma A. Hegde1, Madhura Prakash
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Deduplication in Open Spurce Cloud

  • 1. DATA DEDUPLICATION IN CLOUD STORAGE Presented by M Praveen Kumar
  • 2. What is a cloud storage? • Cloud computing is an emerging computing paradigm in which resources of the computing infrastructure are Provided as services over the Internet. Basic characteristics of cloud: • On-demand self-service • Broad network access • Resource pooling • Measured service • Rapid elasticity
  • 3. What is Deduplication ?  Deduplication: It is a technique which eliminates redundant data by storing only a single copy of each file or block  It reduces the space and the bandwidth requirements of data storage services like cloud  It provides major savings in backup environments(saves more than 90% in common business scenarios)
  • 4.  It is the most impactful storage technology • In April 2008,IBM acquired Diligent • In July 2009,EMC acquired Data Domain • In July 2010,Dell acquired Ocarina
  • 5. How are files deduped ? • Fingerprint each file using a hash function – Common hashes used: Sha1, Sha256, others… – Store an index of all the hashes already in the system • New file: – Compute hash – Look hash up in index table – If new → add to index – If known hash → store as pointer to existing data
  • 6. Deduplication strategies  There are two main deduplication strategies file level deduplication-where only a single copy of file is stored based on the hash value Block level deduplication-where each file is categorized into blocks and stores only a single copy of multiple identical block It can have fixed-size or variable-size blocks
  • 7. This particularly includes the following • Location: Deduplication can be performed at dierent locations. Depending on the participating machines and steps in the customized deduplication process,it is either performed on the client machine (source- side) or near the final data store (target-side). • Since that conserves network bandwidth,
  • 8. • Time: Depending on the use case, deduplication is either performed as the data is transferred from the source to the target (in-band or in-line) or asynchronously in well-dened intervals (out-of-band or post-process). • depending on the amount of data, this approach can cause a bottleneck in terms of throughput on the server side.
  • 9. Applications Due to its storage reducing nature,it is widely used in • Backup systems • Disaster recovery
  • 10. Deduplication methods • Deduplication greatly differs in how the redundant data is identified • Depending on the requirements of the application and the characteristics of data,they are categorised as -single instance storage -fixed size chunking -variable size chunking -file type aware chunking
  • 11. Single instance storage: It does not break the files into smaller chunks but rather uses entire file as chunks This method only eliminates duplicate files and does not detect if the files are altered in just ew bytes Advantages: Indexing performance Low cpu usage Disadv: Cannot be applied for the large files with changin data
  • 12. Fixed size chunking Instead of using entire file as smallest unit It breaks the files into equally sized chunks If a large file is changed only the changed chunks must be re-indexed and transferred to the backup location Disadv: It fails to detect redundant data if some bytes are inserted or deleted from the file because chunk boundaries are determined by offset rather than by content
  • 13. • Variable sized chunking it defines breakpoints where a certain condition becomes true This is usually done by fixed size overlapping sliding window At every offset of a file,the contents of the sliding window are analyzed and a finger print f is calculed If f satisfies the break condition,a new break point has been found and new chunk is created
  • 14. File type aware chunking the best redundancy detection can be achieved if the data stream is understood by the chunking method Adv: Breakpoints are more natural Better space savings
  • 15. Futue scope • The solutions described above provide security in different ways but none of the solutions cover all the vulnerability. In addition to that not even a single method has been proposed which can establish the trust between the user and the cloud service provider. The client is never assured that his files are known only to him and to no one else including the SSP. Therefore we propose a solution whose main aim is to establish the trust between the user and SSP. This method involves injecting an application on client side LAN. The abstract view of the setup is shown in figure 1. The various modules and the working of the system are described below: