SlideShare a Scribd company logo
1 of 8
Download to read offline
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 1|P a g e Copyright@IDL-2017
Secure and Efficient Client and Server Side
Data Deduplication to Reduce Storage in
Remote Cloud Computing Systems
Hemanth Chandra N1
, Sahana D. Gowda2
,
Dept. of Computer Science
1 B.N.M Institute of Technology, Bangalore, India
2 B.N.M Institute of Technology, Bangalore, India
Abstract: Duplication of data in storage systems is
becoming increasingly common problem. The system
introduces I/O Deduplication, a storage optimization
that utilizes content similarity for improving I/O
performance by eliminating I/O operations and
reducing the mechanical delays during I/O operations
and shares data with existing users if Deduplication
found on the client or server side. I/O Deduplication
consists of three main techniques: content-based
caching, dynamic replica retrieval and selective
duplication. Each of these techniques is motivated by
our observations with I/O workload traces obtained
from actively-used production storage systems, all of
which revealed surprisingly high levels of content
similarity for both stored and accessed data.
Keywords: Deduplication, POD, Data Redundancy,
Storage Optimization, iDedup, I/O Deduplication,
I/O Performance.
1. INTRODUCTION
Duplication of data in primary storage systems is quite
common due to the technological trends that have been
driving storage capacity consolidation the elimination
of duplicate content at both the file and block levels
for improving storage space utilization is an active
area of research. Indeed, eliminating most duplicate
content is inevitable in capacity sensitive applications
such as archival storage for cost effectiveness. On the
other hand, there exist systems with a moderate degree
of content similarity in their primary storage such as
email servers, virtualized servers and NAS devices
running file and version control servers. In case of
email servers, mailing lists, circulated attachments and
SPAM can lead to duplication. Virtual machines may
run similar software and thus create collocated
duplicate content across their virtual disks. Finally, file
and version control systems servers of collaborative
groups often store copies of the same documents,
sources and executables. In such systems, if the degree
of content similarity is not overwhelming, eliminating
duplicate data may not be a primary concern.
Gray and Shenoy[1] have pointed out that given the
technology trends for price-capacity and price-
performance of memory/disk sizes and disk accesses
respectively, disk data must “cool” at the rate of 10X
per decade. They suggest data replication as a means
to this end. An instantiation of this suggestion is
intrinsic replication of data created due to
consolidation as seen now in many storage systems,
including the ones illustrated earlier. Here, it is
referring to intrinsic (or application/user generated)
data replication as opposed to forced (system
generated) redundancy such as in a RAID-1 storage
system. In such systems, capacity constraints are
invariably secondary to I/O performance.
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 2|P a g e Copyright@IDL-2017
On-disk duplication of content and I/O traces obtained
from three varied production systems at Florida
International University(FIU) that included a
virtualized host running two department web-servers,
the department email server, and a file server for our
research group has been analyzed. Three observations
have been made from the analysis of these traces.
First, our analysis revealed significant levels of both
disk static similarity and workload static similarity
within each of these systems. Disk static similarity is
an indicator of the amount of duplicate content in the
storage medium, while workload static similarity
indicates the degree of on-disk duplicate content
accessed by the I/O workload. These similarity
measures have been defined formally in § 2. Second, a
consistent and marked discrepancy was discovered
between reuse distances for sector and content in the
I/O accesses on these systems indicating that content
is reused more frequently than sectors. Third, there is
significant overlap in content accessed over successive
intervals of longer time-frames such as days or weeks.
Based on the observations, the premise that intrinsic
content similarity is explored in storage systems and
access to replicated content within I/O workloads can
both be utilized to improve I/O performance. In doing
so, a storage optimization that utilizes content
similarity to either eliminate I/O operations altogether
or optimize the resulting disk head movement within
the storage system is designed and evaluate in I/O
Deduplication. I/O Deduplication comprises three key
techniques: (i)Content-based caching that uses the
popularity of “data content” rather than “data location”
of I/O accesses in making caching decisions,
(ii)Dynamic replica retrieval that upon a cache miss
for a read operation, dynamically chooses to retrieve a
content replica which minimizes disk head movement,
and (iii)Selective duplication that dynamically
replicates frequently accessed content in scratch space
that is distributed over the entire storage medium to
increase the effectiveness of dynamic replica retrieval.
2. RELATED WORK
This paper is related to works on Deduplication
concepts in primary memory. Nimrod Megiddo and
Dharmendra S. Modha[20] discussed about the
problem of cache management in a demand paging
scenario with uniform page sizes. A new cache
management policy is proposed, namely, Adaptive
Replacement Cache (ARC), that has several
advantages. In response to evolving and changing
access patterns, ARC dynamically, adaptively and
continually balances between the recency and
frequency components in an online and self tuning
fashion. The policy ARC uses a learning rule to
adaptively and continually revise its assumptions
about the workload. The policy ARC is empirically
universal, that is, it empirically performs as well as a
certain fixed replacement policy– even when the latter
uses the best workload-specific tuning parameter that
was selected in an offline fashion. Consequently, ARC
works uniformly well across varied workloads and
cache sizes without any need for workload specific a
priori knowledge or tuning. Various policies such as
LRU-2, 2Q, LRFU and LIRS require user-defined
parameters, and unfortunately, no single choice works
uniformly well across different workloads and cache
sizes. The policy ARC is simple-to-implement and
like LRU, has constant complexity per request. In
comparison, policies LRU-2 and LRFU both require
logarithmic time complexity in the cache size. The
policy ARC is scan-resistant: it allows one-time
sequential requests to pass through without polluting
the cache. On real-life traces drawn from numerous
domains, ARC leads to substantial performance gains
over LRU for a wide range of cache sizes.
Aayush Gupta et.al[22] have designed CA-SSD which
employs content-addressable storage (CAS) to exploit
such locality. Our CA-SSD design employs
enhancements primarily in the flash translation layer
(FTL) with minimal additional hardware, suggesting
its feasibility. Using three real-world workloads with
content information, statistical characterizations of two
aspects of value locality - value popularity and
temporal value locality - that form the foundation of
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 3|P a g e Copyright@IDL-2017
CA-SSD is devised. CA-SSD is able to reduce average
response times by about 59-84% compared to
traditional SSDs. Even for workloads with little or no
value locality, CA-SSD continues to offer comparable
performance to a traditional SSD. Our findings
advocate adoption of CAS in SSDs, paving the way
for a new generation of these devices.
Chi Zhang, Xiang Yu, Y. Wang[23] have answered
two key questions. First, since both eager-writing and
mirroring rely on extra capacity to deliver
performance improvements, how to satisfy competing
resource demands given a fixed amount of total disk
space? Second, since eager-writing allows data to be
dynamically located, how to exploit this high degree
of location independence in an intelligent disk
scheduler? In this paper, the two key questions were
addressed and compared the resulting EW-Array
prototype performance against that of conventional
approaches. The experimental results demonstrate that
the eager writing disk array is an effective approach to
providing scalable performance for an important class
of transaction processing applications.
Kiran Srinivasan et.al[24] has proposed an inline
Deduplication solution, iDedup, for primary
workloads, while minimizing extra IOs and seeks. Our
algorithm is based on two key insights from real world
workloads: i) spatial locality exists in duplicated
primary data and ii) temporal locality exists in the
access patterns of duplicated data. Using the first
insight, only sequences of disk blocks were selectively
deduplicated. This reduces fragmentation and
amortizes the seeks caused by Deduplication. The
second insight allows us to replace the expensive, on-
disk, Deduplication metadata with a smaller, in-
memory cache. These techniques enable us to tradeoff
capacity savings for performance, as demonstrated in
our evaluation with real-world workloads. Our
evaluation shows that iDedup achieves 60-70% of the
maximum Deduplication with less than a 5% CPU
overhead and a 2-4% latency impact.
Jiri Schindler et.al[26] introduce proximal I/O, a new
technique for improving random disk I/O performance
in file systems. The key enabling technology for
proximal I/O is the ability of disk drives to retire
multiple I/O’s, spread across dozens of tracks, in a
single revolution. Compared to traditional update-in-
place or write-anywhere file systems, this technique
can provide a nearly seven-fold improvement in
random I/O performance while maintaining (near)
sequential on-disk layout. This paper quantifies
proximal I/O performance and proposes a simple data
layout engine that uses a flash memory-based write
cache to aggregate random updates until they have
sufficient density to exploit proximal I/O. The results
show that with cache of just 1% of the overall disk-
based storage capacity, it is possible to service 5.3 user
I/O requests per revolution for random updates
workload. On an aged file system, the layout can
sustain serial read bandwidth within 3% of the best
case. Despite using flash memory, the overall system
cost is just one third of that of a system with the
requisite number of spindles to achieve the equivalent
number of random I/O operations.
2.1 EXISTING SYSTEM
To address the existing data Deduplication schemes for
primary storage, such as iDedup and Offline-Dedupe,
are capacity oriented in that they focus on storage
capacity savings and only select the large requests to
deduplicate and bypass all the small requests (e.g., 4
KB, 8 KB or less). The rationale is that the small I/O
requests only account for a tiny fraction of the storage
capacity requirement, making Deduplication on them
unprofitable and potentially counterproductive
considering the substantial Deduplication overhead
involved. However, previous workload studies have
revealed that small files dominate in primary storage
systems (more than 50 percent) and are at the root of the
system performance bottleneck. Furthermore, due to the
buffer effect, primary storage workloads exhibit obvious
I/O burstiness. The important performance issue of
primary storage in the Cloud and the above
Deduplication-induced problems, a Performance-
Oriented data Deduplication scheme, called POD, rather
than a capacity-oriented one (e.g., iDedup), to improve
the I/O performance of primary storage systems in the
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 4|P a g e Copyright@IDL-2017
Cloud is proposed. Figure 1 represents the architecture
of POD. By considering the workload characteristics.
POD takes a two-pronged approach to improving the
performance of primary storage systems and minimizing
performance overhead of Deduplication, namely, a
request-based selective Deduplication technique, called
Select-Dedup, to alleviate the data fragmentation and an
adaptive memory management scheme, called iCache,
to ease the memory contention between the bursty read
traffic and the bursty write traffic.
2.2 PROPOSED SYSTEM(POD vs IDedup)
A possible future direction is to optionally coalesce or
even eliminate altogether write I/O operations for
content that are already duplicated elsewhere on the
disk, or alternatively direct such writes to alternate
locations in the scratch space.
Figure 1. Architecture diagram of POD.
While the first option might seem similar to data
Deduplication at a high-level, a primary focus on the
performance implications of such optimizations rather
than capacity improvements has been suggested. Any
optimization for writes affects the read-side
optimizations of I/O Deduplication and a careful
analysis and evaluation of the trade-off points in this
design space is important and shares data with existing
users if Deduplication found in the client or server side.
Advantages:
1.Requires less space in the storage server
2.Data Deduplication will be done completely in all
blocks
3. IMPLEMENTATION
POD
In this paper, the system proposes POD, a
performance-oriented Deduplication scheme, to
improve the performance of primary storage systems
in the Cloud by leveraging data Deduplication on the
I/O path to remove redundant write requests while also
saving storage space. In Figure 2 shows the sequence
diagram where one can understand how the sequence
flow happens between each module. POD takes a
request-based selective Deduplication approach
(Select-Dedupe) to Deduplicating the I/O redundancy
on the critical I/O path in such a way that it minimizes
the data fragmentation problem. In the meanwhile, an
intelligent cache management (iCache) is employed in
POD to further improve read performance and
increase space saving, by adapting to I/O burstiness.
Our extensive trace-driven evaluations show that POD
significantly improves the performance and saves the
capacity of primary storage systems in the Cloud.
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 5|P a g e Copyright@IDL-2017
Figure 2. Sequence diagram of POD.
iDedup
In iDedup system, the system describes iDedup, an
inline Deduplication system specifically targeting
latency-sensitive, primary storage workloads and
comparison with latest techniques like POD also
known as I/O Deduplication. With latency sensitive
workloads, inline Deduplication has many
challenges: fragmentation leading to extra disk seeks
for reads, Deduplication processing overheads in the
critical path and extra latency caused by IOs for
Dedup-metadata management.
3.1 COMPARATIVE ANALYSIS
In this section, the performance of POD vs iDedup
Deduplication models is evaluated through extensive
trace-driven experiments.
(a) Time delay in iDedup
(b) Time delay in POD
Figure 3.The time delay performance of the different
Deduplication schemes.
A prototype of POD as a module in the Linux
operating system and use the trace-driven
experiments to evaluate its effectiveness and
efficiency has been implemented. In this paper, POD
with the capacity oriented scheme iDedup is
compared. Two models (POD vs iDedup) with two
parameters are compared, they are Time delay and
Cpu utilization.
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 6|P a g e Copyright@IDL-2017
(a) Cpu utilization in iDedup
(b) Cpu utilization in POD
Figure 4. The Cpu utilization performance of the
different Deduplication schemes.
In the first parameter comparison i.e., Time delay.
Four production files were compared for our trace
driven evaluation. In Figure 3(a) the time delay for
the iDedup model can be seen and in Figure 3(b) the
time delay for same files in Pod model is shown. For
both the deduplicator models, same files were
uploaded with name cmain, owner, clog, search.
Results shows that Pod is efficient than iDedup.
In the second parameter comparison i.e., Cpu
utilization. a single file named abc.txt was uploaded
to both Pod and iDedup models. In Figure 4(a) Cpu
utilization in iDedup model can be seen and in
Figure 4(b) the Cpu utilization in Pod model can be
seen. From the results it states that Pod is more
efficient than the iDedup model.
4. CONCLUSION
System and storage consolidation trends are driving
increased duplication of data within storage systems.
Past efforts have been primarily directed towards the
elimination of such duplication for improving storage
capacity utilization. With I/O Deduplication, a
contrary view is taken that intrinsic duplication in a
class of systems which are not capacity-bound can be
effectively utilized to improve I/O performance – the
traditional Achilles’ heel for storage systems. Three
techniques contained within I/O Deduplication work
together to either optimize I/O operations or eliminate
them altogether. An in-depth evaluation of these
mechanisms revealed that together they reduced
average disk I/O times by 28-47%, a large
improvement all of which can directly impact the
overall application-level performance of disk I/O
bound systems. The content-based caching mechanism
increased memory caching effectiveness by increasing
cache hit rates by 10% to 4x for read operations when
compared to traditional sector-based caching. Head-
position aware dynamic replica retrieval directed I/O
operations to alternate locations on-the-fly and
additionally reduced I/O times by 10-20%. Selective
duplication created additional replicas of popular
content during periods of low foreground I/O activity
and further improved the effectiveness of dynamic
replica retrieval by 23 35%.
FUTURE WORK
I/O Deduplication opens up several directions for
future work. One avenue for future work is to explore
content-based optimizations for write I/O operations.
A possible future direction is to optionally coalesce or
even eliminate altogether write I/O operations for
content that are already duplicated elsewhere on the
disk or alternatively direct such writes to alternate
locations in the scratch space. While the first option
might seem similar to data Deduplication at a high-
level, a primary focus on the performance implications
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 7|P a g e Copyright@IDL-2017
of such optimizations rather than capacity
improvements is suggested. Any optimization for
writes affects the read-side optimizations of I/O
Deduplication and a careful analysis and evaluation of
the trade-off points in this design space is important
and shares data with existing users if Deduplication
found in the client or server side.
REFERENCES
[1] Jim Gray and Prashant Shenoy. Rules of Thumb in
Data Engineering. Proc. of the IEEE International
Conference on Data Engineering, February 2000.
[2] Charles B. Morrey III and Dirk Grunwald.
Peabody: The Time Travelling Disk. In Proc. of the
IEEE/NASA MSST, 2003.
[3] Windsor W. Hsu, Alan Jay Smith, and Honesty C.
Young. The Automatic Improvement of Locality in
Storage Systems. ACM Transactions on Computer
Systems, 23(4):424–473, Nov 2005.
[4] S. Quinlan and S. Dorward. Venti: A New
Approach to Archival Storage. Proc. of the USENIX
Annual Technical Conference on File and Storage
Technologies, January 2002.
[5] Cyril U. Orji and Jon A. Solworth. Doubly
distorted mirrors. In Proceedings of the ACM
SIGMOD, 1993.
[6] Medha Bhadkamkar, Jorge Guerra, Luis Useche,
Sam Burnett, Jason Liptak, Raju Rangaswami, and
Vagelis Hristidis. BORG: Block-reORGanization for
Selfoptimizing Storage Systems. In Proc. of the
USENIX Annual Technical Conference on File and
Storage Technologies, February 2009.
[7] Sergey Brin, James Davis, and Hector Garcia-
Molina. Copy Detection Mechanisms for Digital
Documents. In Proc. of ACM SIGMOD, May 1995.
[8] Austin Clements, Irfan Ahmad, Murali Vilayannur,
and Jinyuan Li. Decentralized Deduplication in san
cluster file systems. In Proc. of the USENIX Annual
Technical Conference, June 2009.
[9] Burton H. Bloom. Space/time trade-offs in hash
coding with allowable errors. Communications of the
ACM, 13(7):422–426, 1970.
[10] Daniel Ellard, Jonathan Ledlie, Pia Malkani, and
Margo Seltzer. Passive NFS Tracing of Email and
Research Workloads. In Proc. of the USENIX Annual
Technical Conferenceon File and Storage
Technologies, March 2003.
[11] P. Kulkarni, F. Douglis, J. D. LaVoie, and J. M.
Tracey. Redundancy Elimination Within Large
Collections of Files. Proc. of the USENIX Annual
Technical Conference, 2004.
[12] Binny S. Gill. On multi-level exclusive caching:
offline optimality and why promotions are better than
demotions. In Proc. of the USENIX Annual Technical
Conference on File and Storage Technologies,
Feburary 2008.
[13] Diwaker Gupta, Sangmin Lee, Michael Vrable,
Stefan Savage, Alex C. Snoeren, George Varghese,
Geoffrey Voelker, and Amin Vahdat. Difference
Engine: Harnessing Memory Redundancy in Virtual
Machines. Proc. Of the USENIX OSDI, December
2008.
[14] Hai Huang, Wanda Hung, and Kang G. Shin.
FS2: Dynamic Data Replication In Free Disk Space
For Improving Disk Performance And Energy
Consumption. In Proc. of the ACM SOSP, October
2005.
[15] Jorge Guerra, Luis Useche, Medha Bhadkamkar,
Ricardo Koller, and Raju Rangaswami. The Case for
Active Block Layer Extensions. ACM Operating
Systems Review,42(6), October 2008.
[16] N. Jain, M. Dahlin, and R. Tewari. TAPER:
Tiered Approach for Eliminating Redundancy in
IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 8|P a g e Copyright@IDL-2017
Replica Synchronization. In Proc. of the USENIX
Conference on File and Storage Systems, 2005.
[17] Song Jiang, Feng Chen, and Xiaodong Zhang.
Clock-pro: An effective improvement of the clock
replacement. In Proc. of the USENIX Annual
Technical Conference, April 2005.
[18] Andrew Leung, Shankar Pasupathy, Garth
Goodson and Ethan Miller. Measurement and Analysis
of Large-Scale Network File System Workloads. Proc.
of the USENIX Annual Technical Conference, June
2008.
[19] Xuhui Li, Ashraf Aboulnaga, Kenneth Salem,
Aamer Sachedina, and Shaobo Gao. Second-tier cache
management using write hints. In Proc. of the
USENIX Annual Technical Conference on File and
Storage Technologies, 2005.
[20] Nimrod Megiddo and D. S. Modha. Arc: A self-
tuning, low overhead replacement cache. In Proc. of
USENIX Annual Technical Conference on File and
Storage Technologies, 2003.
[21] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L.
Traiger. Evaluation techniques for storage hierarchies.
IBM Systems Journal, 9(2):78–117, 1970.
[22] Aayush Gupta, Raghav Pisolkar, Bhuvan
Urgaonkar, and Anand Sivasubramaniam. Leveraging
Value Locality in Optimizing NAND Flash-based
SSDs, 2011.
[23] Chi Zhang, Xiang Yu, Y. Wang. Configuring and
Scheduling an Eager-Writing Disk Array for a
Transaction Processing Workload. 2002.
[24] Kiran Srinivasan, Tim Bisson, Garth Goodson,
Kaladhar Voruganti. iDedup: Latency-aware, Inline
Data Deduplication for Primary Storage. 2012.
[25] Dina Bitton and Jim Gray. Disk Shadowing. In
Proc. Of the International Conference on Very Large
Data Bases, 1988.
[26] Jiri Schindler, Sandip Shete, Keith A. Smith.
Improving Throughput for Small Disk Requests with
Proximal I/O. 2011.
[27] Mark Lillibridge, Kave Eshghi, Deepavali
Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter
Camble. Sparse indexing: large scale, inline
deduplication using sampling and locality. In Proc. of
the USENIX Annual Technical Conference on File
and Storage Technologies, February 2009.

More Related Content

What's hot

An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...Alexander Decker
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl systemijdms
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterIOSR Journals
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESneirew J
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit
 
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...IJCSIS Research Publications
 
Geo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceGeo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceeSAT Publishing House
 
Improving availability and reducing redundancy using deduplication of cloud s...
Improving availability and reducing redundancy using deduplication of cloud s...Improving availability and reducing redundancy using deduplication of cloud s...
Improving availability and reducing redundancy using deduplication of cloud s...dhanarajp
 
Data Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvementsData Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvementsUmair Amjad
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsShivansh Gaur
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEijdpsjournal
 
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing SystemsProvable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems1crore projects
 

What's hot (17)

An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...
 
50120130406035
5012013040603550120130406035
50120130406035
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl system
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
 
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
 
Geo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceGeo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduce
 
Dremel
DremelDremel
Dremel
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
Improving availability and reducing redundancy using deduplication of cloud s...
Improving availability and reducing redundancy using deduplication of cloud s...Improving availability and reducing redundancy using deduplication of cloud s...
Improving availability and reducing redundancy using deduplication of cloud s...
 
Data Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvementsData Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvements
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
 
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing SystemsProvable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
 

Similar to Secure and Efficient Client and Server Side Data Deduplication to Reduce Storage in Remote Cloud Computing Systems

I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageredpel dot com
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...IRJET Journal
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET Journal
 
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUPEVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUPijdms
 
iaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageiaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageIaetsd Iaetsd
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Efficient usage of memory management in big data using “anti caching”
Efficient usage of memory management in big data using “anti caching”Efficient usage of memory management in big data using “anti caching”
Efficient usage of memory management in big data using “anti caching”eSAT Journals
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmHybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmIRJET Journal
 
Query optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementQuery optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementijdms
 
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTQUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTcsandit
 
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...Vijay Prime
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemijitjournal
 
An Enhanced Cloud Backed Frugal File System
An Enhanced Cloud Backed Frugal File SystemAn Enhanced Cloud Backed Frugal File System
An Enhanced Cloud Backed Frugal File SystemIRJET Journal
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 

Similar to Secure and Efficient Client and Server Side Data Deduplication to Reduce Storage in Remote Cloud Computing Systems (20)

I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUPEVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
 
iaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageiaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storage
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Efficient usage of memory management in big data using “anti caching”
Efficient usage of memory management in big data using “anti caching”Efficient usage of memory management in big data using “anti caching”
Efficient usage of memory management in big data using “anti caching”
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmHybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
 
Query optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementQuery optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query management
 
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTQUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
 
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 
50120130405014 2-3
50120130405014 2-350120130405014 2-3
50120130405014 2-3
 
An Enhanced Cloud Backed Frugal File System
An Enhanced Cloud Backed Frugal File SystemAn Enhanced Cloud Backed Frugal File System
An Enhanced Cloud Backed Frugal File System
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
ICICCE0298
ICICCE0298ICICCE0298
ICICCE0298
 

Recently uploaded

Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 

Recently uploaded (20)

Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 

Secure and Efficient Client and Server Side Data Deduplication to Reduce Storage in Remote Cloud Computing Systems

  • 1. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 1|P a g e Copyright@IDL-2017 Secure and Efficient Client and Server Side Data Deduplication to Reduce Storage in Remote Cloud Computing Systems Hemanth Chandra N1 , Sahana D. Gowda2 , Dept. of Computer Science 1 B.N.M Institute of Technology, Bangalore, India 2 B.N.M Institute of Technology, Bangalore, India Abstract: Duplication of data in storage systems is becoming increasingly common problem. The system introduces I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations and shares data with existing users if Deduplication found on the client or server side. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Keywords: Deduplication, POD, Data Redundancy, Storage Optimization, iDedup, I/O Deduplication, I/O Performance. 1. INTRODUCTION Duplication of data in primary storage systems is quite common due to the technological trends that have been driving storage capacity consolidation the elimination of duplicate content at both the file and block levels for improving storage space utilization is an active area of research. Indeed, eliminating most duplicate content is inevitable in capacity sensitive applications such as archival storage for cost effectiveness. On the other hand, there exist systems with a moderate degree of content similarity in their primary storage such as email servers, virtualized servers and NAS devices running file and version control servers. In case of email servers, mailing lists, circulated attachments and SPAM can lead to duplication. Virtual machines may run similar software and thus create collocated duplicate content across their virtual disks. Finally, file and version control systems servers of collaborative groups often store copies of the same documents, sources and executables. In such systems, if the degree of content similarity is not overwhelming, eliminating duplicate data may not be a primary concern. Gray and Shenoy[1] have pointed out that given the technology trends for price-capacity and price- performance of memory/disk sizes and disk accesses respectively, disk data must “cool” at the rate of 10X per decade. They suggest data replication as a means to this end. An instantiation of this suggestion is intrinsic replication of data created due to consolidation as seen now in many storage systems, including the ones illustrated earlier. Here, it is referring to intrinsic (or application/user generated) data replication as opposed to forced (system generated) redundancy such as in a RAID-1 storage system. In such systems, capacity constraints are invariably secondary to I/O performance.
  • 2. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 2|P a g e Copyright@IDL-2017 On-disk duplication of content and I/O traces obtained from three varied production systems at Florida International University(FIU) that included a virtualized host running two department web-servers, the department email server, and a file server for our research group has been analyzed. Three observations have been made from the analysis of these traces. First, our analysis revealed significant levels of both disk static similarity and workload static similarity within each of these systems. Disk static similarity is an indicator of the amount of duplicate content in the storage medium, while workload static similarity indicates the degree of on-disk duplicate content accessed by the I/O workload. These similarity measures have been defined formally in § 2. Second, a consistent and marked discrepancy was discovered between reuse distances for sector and content in the I/O accesses on these systems indicating that content is reused more frequently than sectors. Third, there is significant overlap in content accessed over successive intervals of longer time-frames such as days or weeks. Based on the observations, the premise that intrinsic content similarity is explored in storage systems and access to replicated content within I/O workloads can both be utilized to improve I/O performance. In doing so, a storage optimization that utilizes content similarity to either eliminate I/O operations altogether or optimize the resulting disk head movement within the storage system is designed and evaluate in I/O Deduplication. I/O Deduplication comprises three key techniques: (i)Content-based caching that uses the popularity of “data content” rather than “data location” of I/O accesses in making caching decisions, (ii)Dynamic replica retrieval that upon a cache miss for a read operation, dynamically chooses to retrieve a content replica which minimizes disk head movement, and (iii)Selective duplication that dynamically replicates frequently accessed content in scratch space that is distributed over the entire storage medium to increase the effectiveness of dynamic replica retrieval. 2. RELATED WORK This paper is related to works on Deduplication concepts in primary memory. Nimrod Megiddo and Dharmendra S. Modha[20] discussed about the problem of cache management in a demand paging scenario with uniform page sizes. A new cache management policy is proposed, namely, Adaptive Replacement Cache (ARC), that has several advantages. In response to evolving and changing access patterns, ARC dynamically, adaptively and continually balances between the recency and frequency components in an online and self tuning fashion. The policy ARC uses a learning rule to adaptively and continually revise its assumptions about the workload. The policy ARC is empirically universal, that is, it empirically performs as well as a certain fixed replacement policy– even when the latter uses the best workload-specific tuning parameter that was selected in an offline fashion. Consequently, ARC works uniformly well across varied workloads and cache sizes without any need for workload specific a priori knowledge or tuning. Various policies such as LRU-2, 2Q, LRFU and LIRS require user-defined parameters, and unfortunately, no single choice works uniformly well across different workloads and cache sizes. The policy ARC is simple-to-implement and like LRU, has constant complexity per request. In comparison, policies LRU-2 and LRFU both require logarithmic time complexity in the cache size. The policy ARC is scan-resistant: it allows one-time sequential requests to pass through without polluting the cache. On real-life traces drawn from numerous domains, ARC leads to substantial performance gains over LRU for a wide range of cache sizes. Aayush Gupta et.al[22] have designed CA-SSD which employs content-addressable storage (CAS) to exploit such locality. Our CA-SSD design employs enhancements primarily in the flash translation layer (FTL) with minimal additional hardware, suggesting its feasibility. Using three real-world workloads with content information, statistical characterizations of two aspects of value locality - value popularity and temporal value locality - that form the foundation of
  • 3. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 3|P a g e Copyright@IDL-2017 CA-SSD is devised. CA-SSD is able to reduce average response times by about 59-84% compared to traditional SSDs. Even for workloads with little or no value locality, CA-SSD continues to offer comparable performance to a traditional SSD. Our findings advocate adoption of CAS in SSDs, paving the way for a new generation of these devices. Chi Zhang, Xiang Yu, Y. Wang[23] have answered two key questions. First, since both eager-writing and mirroring rely on extra capacity to deliver performance improvements, how to satisfy competing resource demands given a fixed amount of total disk space? Second, since eager-writing allows data to be dynamically located, how to exploit this high degree of location independence in an intelligent disk scheduler? In this paper, the two key questions were addressed and compared the resulting EW-Array prototype performance against that of conventional approaches. The experimental results demonstrate that the eager writing disk array is an effective approach to providing scalable performance for an important class of transaction processing applications. Kiran Srinivasan et.al[24] has proposed an inline Deduplication solution, iDedup, for primary workloads, while minimizing extra IOs and seeks. Our algorithm is based on two key insights from real world workloads: i) spatial locality exists in duplicated primary data and ii) temporal locality exists in the access patterns of duplicated data. Using the first insight, only sequences of disk blocks were selectively deduplicated. This reduces fragmentation and amortizes the seeks caused by Deduplication. The second insight allows us to replace the expensive, on- disk, Deduplication metadata with a smaller, in- memory cache. These techniques enable us to tradeoff capacity savings for performance, as demonstrated in our evaluation with real-world workloads. Our evaluation shows that iDedup achieves 60-70% of the maximum Deduplication with less than a 5% CPU overhead and a 2-4% latency impact. Jiri Schindler et.al[26] introduce proximal I/O, a new technique for improving random disk I/O performance in file systems. The key enabling technology for proximal I/O is the ability of disk drives to retire multiple I/O’s, spread across dozens of tracks, in a single revolution. Compared to traditional update-in- place or write-anywhere file systems, this technique can provide a nearly seven-fold improvement in random I/O performance while maintaining (near) sequential on-disk layout. This paper quantifies proximal I/O performance and proposes a simple data layout engine that uses a flash memory-based write cache to aggregate random updates until they have sufficient density to exploit proximal I/O. The results show that with cache of just 1% of the overall disk- based storage capacity, it is possible to service 5.3 user I/O requests per revolution for random updates workload. On an aged file system, the layout can sustain serial read bandwidth within 3% of the best case. Despite using flash memory, the overall system cost is just one third of that of a system with the requisite number of spindles to achieve the equivalent number of random I/O operations. 2.1 EXISTING SYSTEM To address the existing data Deduplication schemes for primary storage, such as iDedup and Offline-Dedupe, are capacity oriented in that they focus on storage capacity savings and only select the large requests to deduplicate and bypass all the small requests (e.g., 4 KB, 8 KB or less). The rationale is that the small I/O requests only account for a tiny fraction of the storage capacity requirement, making Deduplication on them unprofitable and potentially counterproductive considering the substantial Deduplication overhead involved. However, previous workload studies have revealed that small files dominate in primary storage systems (more than 50 percent) and are at the root of the system performance bottleneck. Furthermore, due to the buffer effect, primary storage workloads exhibit obvious I/O burstiness. The important performance issue of primary storage in the Cloud and the above Deduplication-induced problems, a Performance- Oriented data Deduplication scheme, called POD, rather than a capacity-oriented one (e.g., iDedup), to improve the I/O performance of primary storage systems in the
  • 4. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 4|P a g e Copyright@IDL-2017 Cloud is proposed. Figure 1 represents the architecture of POD. By considering the workload characteristics. POD takes a two-pronged approach to improving the performance of primary storage systems and minimizing performance overhead of Deduplication, namely, a request-based selective Deduplication technique, called Select-Dedup, to alleviate the data fragmentation and an adaptive memory management scheme, called iCache, to ease the memory contention between the bursty read traffic and the bursty write traffic. 2.2 PROPOSED SYSTEM(POD vs IDedup) A possible future direction is to optionally coalesce or even eliminate altogether write I/O operations for content that are already duplicated elsewhere on the disk, or alternatively direct such writes to alternate locations in the scratch space. Figure 1. Architecture diagram of POD. While the first option might seem similar to data Deduplication at a high-level, a primary focus on the performance implications of such optimizations rather than capacity improvements has been suggested. Any optimization for writes affects the read-side optimizations of I/O Deduplication and a careful analysis and evaluation of the trade-off points in this design space is important and shares data with existing users if Deduplication found in the client or server side. Advantages: 1.Requires less space in the storage server 2.Data Deduplication will be done completely in all blocks 3. IMPLEMENTATION POD In this paper, the system proposes POD, a performance-oriented Deduplication scheme, to improve the performance of primary storage systems in the Cloud by leveraging data Deduplication on the I/O path to remove redundant write requests while also saving storage space. In Figure 2 shows the sequence diagram where one can understand how the sequence flow happens between each module. POD takes a request-based selective Deduplication approach (Select-Dedupe) to Deduplicating the I/O redundancy on the critical I/O path in such a way that it minimizes the data fragmentation problem. In the meanwhile, an intelligent cache management (iCache) is employed in POD to further improve read performance and increase space saving, by adapting to I/O burstiness. Our extensive trace-driven evaluations show that POD significantly improves the performance and saves the capacity of primary storage systems in the Cloud.
  • 5. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 5|P a g e Copyright@IDL-2017 Figure 2. Sequence diagram of POD. iDedup In iDedup system, the system describes iDedup, an inline Deduplication system specifically targeting latency-sensitive, primary storage workloads and comparison with latest techniques like POD also known as I/O Deduplication. With latency sensitive workloads, inline Deduplication has many challenges: fragmentation leading to extra disk seeks for reads, Deduplication processing overheads in the critical path and extra latency caused by IOs for Dedup-metadata management. 3.1 COMPARATIVE ANALYSIS In this section, the performance of POD vs iDedup Deduplication models is evaluated through extensive trace-driven experiments. (a) Time delay in iDedup (b) Time delay in POD Figure 3.The time delay performance of the different Deduplication schemes. A prototype of POD as a module in the Linux operating system and use the trace-driven experiments to evaluate its effectiveness and efficiency has been implemented. In this paper, POD with the capacity oriented scheme iDedup is compared. Two models (POD vs iDedup) with two parameters are compared, they are Time delay and Cpu utilization.
  • 6. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 6|P a g e Copyright@IDL-2017 (a) Cpu utilization in iDedup (b) Cpu utilization in POD Figure 4. The Cpu utilization performance of the different Deduplication schemes. In the first parameter comparison i.e., Time delay. Four production files were compared for our trace driven evaluation. In Figure 3(a) the time delay for the iDedup model can be seen and in Figure 3(b) the time delay for same files in Pod model is shown. For both the deduplicator models, same files were uploaded with name cmain, owner, clog, search. Results shows that Pod is efficient than iDedup. In the second parameter comparison i.e., Cpu utilization. a single file named abc.txt was uploaded to both Pod and iDedup models. In Figure 4(a) Cpu utilization in iDedup model can be seen and in Figure 4(b) the Cpu utilization in Pod model can be seen. From the results it states that Pod is more efficient than the iDedup model. 4. CONCLUSION System and storage consolidation trends are driving increased duplication of data within storage systems. Past efforts have been primarily directed towards the elimination of such duplication for improving storage capacity utilization. With I/O Deduplication, a contrary view is taken that intrinsic duplication in a class of systems which are not capacity-bound can be effectively utilized to improve I/O performance – the traditional Achilles’ heel for storage systems. Three techniques contained within I/O Deduplication work together to either optimize I/O operations or eliminate them altogether. An in-depth evaluation of these mechanisms revealed that together they reduced average disk I/O times by 28-47%, a large improvement all of which can directly impact the overall application-level performance of disk I/O bound systems. The content-based caching mechanism increased memory caching effectiveness by increasing cache hit rates by 10% to 4x for read operations when compared to traditional sector-based caching. Head- position aware dynamic replica retrieval directed I/O operations to alternate locations on-the-fly and additionally reduced I/O times by 10-20%. Selective duplication created additional replicas of popular content during periods of low foreground I/O activity and further improved the effectiveness of dynamic replica retrieval by 23 35%. FUTURE WORK I/O Deduplication opens up several directions for future work. One avenue for future work is to explore content-based optimizations for write I/O operations. A possible future direction is to optionally coalesce or even eliminate altogether write I/O operations for content that are already duplicated elsewhere on the disk or alternatively direct such writes to alternate locations in the scratch space. While the first option might seem similar to data Deduplication at a high- level, a primary focus on the performance implications
  • 7. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 7|P a g e Copyright@IDL-2017 of such optimizations rather than capacity improvements is suggested. Any optimization for writes affects the read-side optimizations of I/O Deduplication and a careful analysis and evaluation of the trade-off points in this design space is important and shares data with existing users if Deduplication found in the client or server side. REFERENCES [1] Jim Gray and Prashant Shenoy. Rules of Thumb in Data Engineering. Proc. of the IEEE International Conference on Data Engineering, February 2000. [2] Charles B. Morrey III and Dirk Grunwald. Peabody: The Time Travelling Disk. In Proc. of the IEEE/NASA MSST, 2003. [3] Windsor W. Hsu, Alan Jay Smith, and Honesty C. Young. The Automatic Improvement of Locality in Storage Systems. ACM Transactions on Computer Systems, 23(4):424–473, Nov 2005. [4] S. Quinlan and S. Dorward. Venti: A New Approach to Archival Storage. Proc. of the USENIX Annual Technical Conference on File and Storage Technologies, January 2002. [5] Cyril U. Orji and Jon A. Solworth. Doubly distorted mirrors. In Proceedings of the ACM SIGMOD, 1993. [6] Medha Bhadkamkar, Jorge Guerra, Luis Useche, Sam Burnett, Jason Liptak, Raju Rangaswami, and Vagelis Hristidis. BORG: Block-reORGanization for Selfoptimizing Storage Systems. In Proc. of the USENIX Annual Technical Conference on File and Storage Technologies, February 2009. [7] Sergey Brin, James Davis, and Hector Garcia- Molina. Copy Detection Mechanisms for Digital Documents. In Proc. of ACM SIGMOD, May 1995. [8] Austin Clements, Irfan Ahmad, Murali Vilayannur, and Jinyuan Li. Decentralized Deduplication in san cluster file systems. In Proc. of the USENIX Annual Technical Conference, June 2009. [9] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970. [10] Daniel Ellard, Jonathan Ledlie, Pia Malkani, and Margo Seltzer. Passive NFS Tracing of Email and Research Workloads. In Proc. of the USENIX Annual Technical Conferenceon File and Storage Technologies, March 2003. [11] P. Kulkarni, F. Douglis, J. D. LaVoie, and J. M. Tracey. Redundancy Elimination Within Large Collections of Files. Proc. of the USENIX Annual Technical Conference, 2004. [12] Binny S. Gill. On multi-level exclusive caching: offline optimality and why promotions are better than demotions. In Proc. of the USENIX Annual Technical Conference on File and Storage Technologies, Feburary 2008. [13] Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey Voelker, and Amin Vahdat. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. Proc. Of the USENIX OSDI, December 2008. [14] Hai Huang, Wanda Hung, and Kang G. Shin. FS2: Dynamic Data Replication In Free Disk Space For Improving Disk Performance And Energy Consumption. In Proc. of the ACM SOSP, October 2005. [15] Jorge Guerra, Luis Useche, Medha Bhadkamkar, Ricardo Koller, and Raju Rangaswami. The Case for Active Block Layer Extensions. ACM Operating Systems Review,42(6), October 2008. [16] N. Jain, M. Dahlin, and R. Tewari. TAPER: Tiered Approach for Eliminating Redundancy in
  • 8. IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org International e-Journal For Technology And Research-2017 IDL - International Digital Library 8|P a g e Copyright@IDL-2017 Replica Synchronization. In Proc. of the USENIX Conference on File and Storage Systems, 2005. [17] Song Jiang, Feng Chen, and Xiaodong Zhang. Clock-pro: An effective improvement of the clock replacement. In Proc. of the USENIX Annual Technical Conference, April 2005. [18] Andrew Leung, Shankar Pasupathy, Garth Goodson and Ethan Miller. Measurement and Analysis of Large-Scale Network File System Workloads. Proc. of the USENIX Annual Technical Conference, June 2008. [19] Xuhui Li, Ashraf Aboulnaga, Kenneth Salem, Aamer Sachedina, and Shaobo Gao. Second-tier cache management using write hints. In Proc. of the USENIX Annual Technical Conference on File and Storage Technologies, 2005. [20] Nimrod Megiddo and D. S. Modha. Arc: A self- tuning, low overhead replacement cache. In Proc. of USENIX Annual Technical Conference on File and Storage Technologies, 2003. [21] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78–117, 1970. [22] Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. Leveraging Value Locality in Optimizing NAND Flash-based SSDs, 2011. [23] Chi Zhang, Xiang Yu, Y. Wang. Configuring and Scheduling an Eager-Writing Disk Array for a Transaction Processing Workload. 2002. [24] Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti. iDedup: Latency-aware, Inline Data Deduplication for Primary Storage. 2012. [25] Dina Bitton and Jim Gray. Disk Shadowing. In Proc. Of the International Conference on Very Large Data Bases, 1988. [26] Jiri Schindler, Sandip Shete, Keith A. Smith. Improving Throughput for Small Disk Requests with Proximal I/O. 2011. [27] Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble. Sparse indexing: large scale, inline deduplication using sampling and locality. In Proc. of the USENIX Annual Technical Conference on File and Storage Technologies, February 2009.