SlideShare a Scribd company logo
1 of 10
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Reducing Fragmentation for In-line Deduplication
Backup Storage via Exploiting Backup History
And Cache Knowledge
ABSTRACT
In backup systems, the chunks of each backup are physically scattered after
deduplication, which causes a challenging fragmentation problem. We observe that
the fragmentation comes into sparse and out-of-order containers. The sparse
container decreases restore performance and garbage collection efficiency, while
the out-of-order container decreases restore performance if the restore cache is
small. In order to reduce the fragmentation, we propose History-Aware Rewriting
algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical
information in backup systems to accurately identify and reduce sparse containers,
and CAF exploits restore cache knowledge to identify the out-of-order containers
that hurt restore performance. CAF efficiently complements HAR in datasets
where out-of-order containers are dominant. To reduce the metadata overhead of
the garbage collection, we further propose a Container-Marker Algorithm (CMA)
to identify valid containers instead of valid chunks. Our extensive experimental
results from real-world datasets show HAR significantly improves the restore
performance by 2.84-175.36at a cost of only rewriting 0.5-2.03% data
Existing System:
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
In backup systems, the chunks of each backup are physically scattered after
deduplication, which causes a challenging fragmentation problem. We observe that
the fragmentation comes into sparse and out-of-order containers. The sparse
container decreases restore performance and garbage collection efficiency, while
the out-of-order container decreases restore performance if the restore cache is
small. In order to reduce the fragmentation
Proposed System
we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter
(CAF). HAR exploits historical information in backup systems to accurately
identify and reduce sparse containers, and CAF exploits restore cache knowledge
to identify the out-of-order containers that hurt restore performance. CAF
efficiently complements HAR in datasets where out-of-order containers are
dominant. To reduce the metadata overhead of the garbage collection, we further
propose a Container-Marker Algorithm (CMA) to identify valid containers instead
of valid chunks. Our extensive experimental results from real-world datasets show
HAR significantly improves the restore performance by 2.84-175.36at a cost of
only rewriting 0.5-2.03% data.
propose a hybrid rewriting algorithm as complements of HAR to reduce the
negative impacts of out-of-order containers. HAR, as well as OPT, improves
restore performance by 2.84- 175.36 at an acceptable cost in deduplication ratio.
HAR outperforms the state-of-the-art work in terms of both deduplication ratio and
restore performance. The hybrid scheme is helpful to further improve restore
performance in datasets where out-of-order containers are dominant. To avoid a
significant decrease of deduplication ratio in the hybrid scheme, we develop a
Cache-Aware Filter (CAF) to exploit cache knowledge. With the help of CAF, the
hybrid scheme significantly improves the deduplication ratio without decreasing
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
the restore performance. Note that CAF can be used as an optimization of existing
rewriting algorithms.
MODULES
Architecture:
MODULES
 Storage system
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
 Data Deduplication
 chunk fragmentation,
 performance evaluation
Storage system
Cloud storage is a model of data storage where the digital data is stored in
logical pools, the physical storage spans multiple servers (and often locations),
and the physical environment is typically owned and managed by a hosting
company. These cloud storage providers are responsible for keeping the data
available and accessible, and the physical environment protected and running.
People and organizations buy or lease storage capacity from the providers to
store end user, organization, or application data
Data Deduplication
multiple backups, the chunks of a backup unfortunately become physically
scattered in different containers, which is known as fragmentation. The negative
impacts of the fragmentation are two-fold. First, the fragmentation severely
decreases restore performance. The infrequent restore is important and the main
concern from users Moreover, data replication, which is important for disaster
recovery, requires reconstructions of original backup streams from
deduplication systems and thus suffers from a performance problem similar to
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
the restore operation. Second, the fragmentation results in invalid chunks (not
referenced by any backups) becoming physically scattered in different
containers when users delete expired backups. Existing garbage collection
solutions first identify valid chunks and the containers holding only a few valid
chunks (i.e., reference management Then, a merging operation is required to
copy the valid chunks in the identified containers to new containers Finally, the
identified containers are reclaimed. Unfortunately, the metadata space overhead
of reference management is proportional to the number of chunks, and the
merging operation is the most timeconsuming phase in garbage collection We
observe that the fragmentation comes in two categories of containers: sparse
containers and out-of-order containers, which have different negative impacts
and require dedicated solutions.
Chunk fragmentation
The fragmentation problem in deduplication systems has received many
attentions. iDedup eliminates sequential and duplicate chunks in the context of
primary storage systems. Nam et al. propose a quantitative metric to measure the
fragmentation level of deduplication systems , and a selective deduplication
scheme for backup workloads. SAR stores hot chunks in SSD to accelerate
reads. RevDedup employs a hybrid inline and out-of-line deduplication scheme
to improve restore performance of latest backups. The Context-Based Rewriting
algorithm (CBR) [17] and the capping algorithm (Capping) are recently proposed
rewriting algorithms to address the fragmentation problem. Both of them buffer
a small part of the on-going backup stream during a backup, and identify
fragmented chunks within the buffer (generally 10-20 MB). For example,
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Capping divides the backup stream into fixed-sized segments (e.g., 20 MB), and
conjectures the fragmentation within each segment. Capping limits the
maximum number (say T) of containers a segment can refer to. Suppose a new
segment refers to N containers and N > T, the chunks in the N �T containers that
hold the least chunks in the segment are rewritten. Reference management for
the garbage collection is complicated since each chunk can be referenced by
multiple backups. The offline approaches traverse all fingerprints (including the
fingerprint index and recipes) when the system is idle. For example, Botelho et
al. [14] build a perfect hash vector as a compact representation of all chunks.
Since recipes need to occupy significantly large storage space , the traversing
operation is timeconsuming. The inline approaches maintain additional
metadata during backup to facilitate the garbage collection. Maintaining a
reference counter for each chunk is expensive and error-prone . Grouped Mark-
and- Sweep (GMS) uses a bitmap to mark which chunks in a container are used
by a backup.
performance evaluation
Four datasets, including Kernel, VMDK, RDB, and Synthetic, are used for
evaluation. Their characteristics are listed in Table 1. Each backup stream is
divided into variable-sized chunks via Content-Defined Chunking . Kernel,
downloaded from the web is a commonly used public dataset [. It consists of
258 consecutive versions of unpacked Linux codes. Each version is 412:78 MB on
average. Two consecutive versions are generally 99% identical except when
there are major revision upgrades. There are only a few self-references and
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
hence sparse containers are dominant. VMDK is from a virtual machine
installed Ubuntu 12.04LTS, which is a common use-case in real-world . We
compiled source code, patched the system, and ran an HTTP server on the
virtual machine. VMDK consists of 126 full backups. Each full backup is 15:36
GB in size on average, and 90-98% identical to its adjacent backups. Each backup
contains about 15% self-referred chunks. Out-of-order containers are dominant
and sparse containers are less severe. RDB consists of snapshots of a Redis
database The database has 5 million records, 5 GB in space. We ran YCSB to
update the database in a Zipfian distribution. The update ratio is of 1% on
average. After each run, we archived the uncompressed dump.rdb file that is the
on-disk snapshot of the database. Finally, we got 212 versions of snapshots.
There is no self-reference and hence sparse containers are dominant. Synthetic
was generated according to existing approaches We simulated common
operations of file systems, such as file creation/deletion/modification. We
finally obtained a 4:5 TB dataset with 400 versions. There is no self-reference in
Synthetic and sparse containers are dominant.
History-Aware Rewriting Algorithm
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
At the beginning of a backup, HAR loads IDs of all inherited sparse containers
to construct the in-memory Sinherited structure During the backup, HAR
rewrites all duplicate chunks whose container IDs exist in Sinherited.
Additionally, HAR maintains an in-memory structure, Semerging (included in
collected info in Figure 3), to monitor the utilizations of all the containers
referenced by the backup. Semerging is a set of utilization records, and each
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
record consists of a container ID and the current utilization of the container.
After the backup concludes, HAR removes the records of higher utilizations
than the utilization threshold from Semerging. Semerging then contains IDs of
all emerging sparse containers. In most cases, Semerging can be flushed directly
to disks as the Sinherited of the next backup, because the size of Semerging is
generally small due to our second observation. However, there are two spikes in
. A large number of emerging sparse containers indicates that we have many
fragmented chunks to be rewritten in next backup. It would change the
performance bottleneck to data writing and hurt the backup performance that is
of top priority . To address this problem, HAR sets a rewrite limit, such as 5%, to
avoid too much rewrites in next backup. HAR uses the rewrite limit to
determine whether there are too many sparse containers in Semerging. (1) HAR
calculates an estimated rewrite ratio (defined as the size of rewritten data
divided by the backup size) for the next backup. Specifically, HAR first
calculates the
System Configuration:
HARDWARE REQUIREMENTS:
Hardware - Pentium
Speed - 1.1 GHz
RAM - 1GB
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Hard Disk - 20 GB
Floppy Drive - 1.44 MB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
SOFTWARE REQUIREMENTS:
Operating System : Windows
Technology : Java and J2EE
Web Technologies : Html, JavaScript, CSS
IDE : My Eclipse
Web Server : Tomcat
Tool kit : Android Phone
Database : My SQL
Java Version : J2SDK1.5

More Related Content

What's hot

PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEijdpsjournal
 
Integrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopIntegrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopJoão Gabriel Lima
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performanceijcsa
 
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESSTUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESijdpsjournal
 
RDB - Repairable Database Systems
RDB - Repairable Database SystemsRDB - Repairable Database Systems
RDB - Repairable Database SystemsAlexey Smirnov
 
Change data capture the journey to real time bi
Change data capture the journey to real time biChange data capture the journey to real time bi
Change data capture the journey to real time biAsis Mohanty
 
Geo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceGeo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceeSAT Publishing House
 
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONA BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONcscpconf
 
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing SystemsProvable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems1crore projects
 
Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...eSAT Publishing House
 
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...Nandhitha B
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseMd. Shamsur Rahim
 
An Index-first Addressing Scheme for Multi-level Caches
An Index-first Addressing Scheme for Multi-level CachesAn Index-first Addressing Scheme for Multi-level Caches
An Index-first Addressing Scheme for Multi-level Cachesidescitation
 
Erlang Cache
Erlang CacheErlang Cache
Erlang Cacheice j
 

What's hot (19)

PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
 
Integrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopIntegrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoop
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performance
 
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESSTUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
 
RDB - Repairable Database Systems
RDB - Repairable Database SystemsRDB - Repairable Database Systems
RDB - Repairable Database Systems
 
Change data capture the journey to real time bi
Change data capture the journey to real time biChange data capture the journey to real time bi
Change data capture the journey to real time bi
 
Geo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceGeo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduce
 
Mobile computing unit 5
Mobile computing  unit 5Mobile computing  unit 5
Mobile computing unit 5
 
Cjoin
CjoinCjoin
Cjoin
 
Chapter25
Chapter25Chapter25
Chapter25
 
HA Summary
HA SummaryHA Summary
HA Summary
 
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONA BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
 
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing SystemsProvable Multicopy Dynamic Data Possession in Cloud Computing Systems
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
 
Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...
 
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
 
Sqlmr
SqlmrSqlmr
Sqlmr
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed Database
 
An Index-first Addressing Scheme for Multi-level Caches
An Index-first Addressing Scheme for Multi-level CachesAn Index-first Addressing Scheme for Multi-level Caches
An Index-first Addressing Scheme for Multi-level Caches
 
Erlang Cache
Erlang CacheErlang Cache
Erlang Cache
 

Similar to Reducing fragmentation for in line deduplication

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
 
High performance operating system controlled memory compression
High performance operating system controlled memory compressionHigh performance operating system controlled memory compression
High performance operating system controlled memory compressionMr. Chanuwan
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...IRJET Journal
 
E24099025李宇洋_專題報告.pdf
E24099025李宇洋_專題報告.pdfE24099025李宇洋_專題報告.pdf
E24099025李宇洋_專題報告.pdfKerzPAry137
 
Charm a cost efficient multi cloud data hosting scheme with high availability
Charm a cost efficient multi cloud data hosting scheme with high availabilityCharm a cost efficient multi cloud data hosting scheme with high availability
Charm a cost efficient multi cloud data hosting scheme with high availabilityPvrtechnologies Nellore
 
Power minimization of systems using Performance Enhancement Guaranteed Caches
Power minimization of systems using Performance Enhancement Guaranteed CachesPower minimization of systems using Performance Enhancement Guaranteed Caches
Power minimization of systems using Performance Enhancement Guaranteed CachesIJTET Journal
 
IRJET- A Futuristic Cache Replacement using Hybrid Regression
IRJET-  	  A Futuristic Cache Replacement using Hybrid RegressionIRJET-  	  A Futuristic Cache Replacement using Hybrid Regression
IRJET- A Futuristic Cache Replacement using Hybrid RegressionIRJET Journal
 
lecture-2-3_Memory.pdf,describing memory
lecture-2-3_Memory.pdf,describing memorylecture-2-3_Memory.pdf,describing memory
lecture-2-3_Memory.pdf,describing memoryfloraaluoch3
 
Robust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory InterfaceRobust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory InterfaceIOSRJVSP
 
Postponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud MemoryPostponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud MemoryIJARIIT
 
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandWhy is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandINFINIDAT
 
Different Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache MemoryDifferent Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache MemoryDhritiman Halder
 
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...Mshari Alabdulkarim
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET Journal
 
Assimilating sense into disaster recovery databases and judgement framing pr...
Assimilating sense into disaster recovery databases and  judgement framing pr...Assimilating sense into disaster recovery databases and  judgement framing pr...
Assimilating sense into disaster recovery databases and judgement framing pr...IJECEIAES
 
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)DataCore APAC
 
Investigations on Implementation of Ternary Content Addressable Memory Archit...
Investigations on Implementation of Ternary Content Addressable Memory Archit...Investigations on Implementation of Ternary Content Addressable Memory Archit...
Investigations on Implementation of Ternary Content Addressable Memory Archit...IRJET Journal
 

Similar to Reducing fragmentation for in line deduplication (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
High performance operating system controlled memory compression
High performance operating system controlled memory compressionHigh performance operating system controlled memory compression
High performance operating system controlled memory compression
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
 
E24099025李宇洋_專題報告.pdf
E24099025李宇洋_專題報告.pdfE24099025李宇洋_專題報告.pdf
E24099025李宇洋_專題報告.pdf
 
Charm a cost efficient multi cloud data hosting scheme with high availability
Charm a cost efficient multi cloud data hosting scheme with high availabilityCharm a cost efficient multi cloud data hosting scheme with high availability
Charm a cost efficient multi cloud data hosting scheme with high availability
 
Power minimization of systems using Performance Enhancement Guaranteed Caches
Power minimization of systems using Performance Enhancement Guaranteed CachesPower minimization of systems using Performance Enhancement Guaranteed Caches
Power minimization of systems using Performance Enhancement Guaranteed Caches
 
IRJET- A Futuristic Cache Replacement using Hybrid Regression
IRJET-  	  A Futuristic Cache Replacement using Hybrid RegressionIRJET-  	  A Futuristic Cache Replacement using Hybrid Regression
IRJET- A Futuristic Cache Replacement using Hybrid Regression
 
lecture-2-3_Memory.pdf,describing memory
lecture-2-3_Memory.pdf,describing memorylecture-2-3_Memory.pdf,describing memory
lecture-2-3_Memory.pdf,describing memory
 
Robust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory InterfaceRobust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory Interface
 
P2P Cache Resolution System for MANET
P2P Cache Resolution System for MANETP2P Cache Resolution System for MANET
P2P Cache Resolution System for MANET
 
Postponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud MemoryPostponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud Memory
 
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandWhy is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
 
Different Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache MemoryDifferent Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache Memory
 
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-As...
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
 
Assimilating sense into disaster recovery databases and judgement framing pr...
Assimilating sense into disaster recovery databases and  judgement framing pr...Assimilating sense into disaster recovery databases and  judgement framing pr...
Assimilating sense into disaster recovery databases and judgement framing pr...
 
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
 
Investigations on Implementation of Ternary Content Addressable Memory Archit...
Investigations on Implementation of Ternary Content Addressable Memory Archit...Investigations on Implementation of Ternary Content Addressable Memory Archit...
Investigations on Implementation of Ternary Content Addressable Memory Archit...
 

More from Pvrtechnologies Nellore

A High Throughput List Decoder Architecture for Polar Codes
A High Throughput List Decoder Architecture for Polar CodesA High Throughput List Decoder Architecture for Polar Codes
A High Throughput List Decoder Architecture for Polar CodesPvrtechnologies Nellore
 
Performance/Power Space Exploration for Binary64 Division Units
Performance/Power Space Exploration for Binary64 Division UnitsPerformance/Power Space Exploration for Binary64 Division Units
Performance/Power Space Exploration for Binary64 Division UnitsPvrtechnologies Nellore
 
Hybrid LUT/Multiplexer FPGA Logic Architectures
Hybrid LUT/Multiplexer FPGA Logic ArchitecturesHybrid LUT/Multiplexer FPGA Logic Architectures
Hybrid LUT/Multiplexer FPGA Logic ArchitecturesPvrtechnologies Nellore
 
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...Pvrtechnologies Nellore
 
2016 2017 ieee ece embedded- project titles
2016   2017 ieee ece  embedded- project titles2016   2017 ieee ece  embedded- project titles
2016 2017 ieee ece embedded- project titlesPvrtechnologies Nellore
 
A High-Speed FPGA Implementation of an RSD-Based ECC Processor
A High-Speed FPGA Implementation of an RSD-Based ECC ProcessorA High-Speed FPGA Implementation of an RSD-Based ECC Processor
A High-Speed FPGA Implementation of an RSD-Based ECC ProcessorPvrtechnologies Nellore
 
6On Efficient Retiming of Fixed-Point Circuits
6On Efficient Retiming of Fixed-Point Circuits6On Efficient Retiming of Fixed-Point Circuits
6On Efficient Retiming of Fixed-Point CircuitsPvrtechnologies Nellore
 
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encodingPre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encodingPvrtechnologies Nellore
 
Quality of-protection-driven data forwarding for intermittently connected wir...
Quality of-protection-driven data forwarding for intermittently connected wir...Quality of-protection-driven data forwarding for intermittently connected wir...
Quality of-protection-driven data forwarding for intermittently connected wir...Pvrtechnologies Nellore
 
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...Pvrtechnologies Nellore
 
Control cloud data access privilege and anonymity with fully anonymous attrib...
Control cloud data access privilege and anonymity with fully anonymous attrib...Control cloud data access privilege and anonymity with fully anonymous attrib...
Control cloud data access privilege and anonymity with fully anonymous attrib...Pvrtechnologies Nellore
 
Cloud keybank privacy and owner authorization
Cloud keybank  privacy and owner authorizationCloud keybank  privacy and owner authorization
Cloud keybank privacy and owner authorizationPvrtechnologies Nellore
 
Circuit ciphertext policy attribute-based hybrid encryption with verifiable
Circuit ciphertext policy attribute-based hybrid encryption with verifiableCircuit ciphertext policy attribute-based hybrid encryption with verifiable
Circuit ciphertext policy attribute-based hybrid encryption with verifiablePvrtechnologies Nellore
 

More from Pvrtechnologies Nellore (20)

A High Throughput List Decoder Architecture for Polar Codes
A High Throughput List Decoder Architecture for Polar CodesA High Throughput List Decoder Architecture for Polar Codes
A High Throughput List Decoder Architecture for Polar Codes
 
Performance/Power Space Exploration for Binary64 Division Units
Performance/Power Space Exploration for Binary64 Division UnitsPerformance/Power Space Exploration for Binary64 Division Units
Performance/Power Space Exploration for Binary64 Division Units
 
Hybrid LUT/Multiplexer FPGA Logic Architectures
Hybrid LUT/Multiplexer FPGA Logic ArchitecturesHybrid LUT/Multiplexer FPGA Logic Architectures
Hybrid LUT/Multiplexer FPGA Logic Architectures
 
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
 
2016 2017 ieee matlab project titles
2016 2017 ieee matlab project titles2016 2017 ieee matlab project titles
2016 2017 ieee matlab project titles
 
2016 2017 ieee vlsi project titles
2016   2017 ieee vlsi project titles2016   2017 ieee vlsi project titles
2016 2017 ieee vlsi project titles
 
2016 2017 ieee ece embedded- project titles
2016   2017 ieee ece  embedded- project titles2016   2017 ieee ece  embedded- project titles
2016 2017 ieee ece embedded- project titles
 
A High-Speed FPGA Implementation of an RSD-Based ECC Processor
A High-Speed FPGA Implementation of an RSD-Based ECC ProcessorA High-Speed FPGA Implementation of an RSD-Based ECC Processor
A High-Speed FPGA Implementation of an RSD-Based ECC Processor
 
6On Efficient Retiming of Fixed-Point Circuits
6On Efficient Retiming of Fixed-Point Circuits6On Efficient Retiming of Fixed-Point Circuits
6On Efficient Retiming of Fixed-Point Circuits
 
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encodingPre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
 
Quality of-protection-driven data forwarding for intermittently connected wir...
Quality of-protection-driven data forwarding for intermittently connected wir...Quality of-protection-driven data forwarding for intermittently connected wir...
Quality of-protection-driven data forwarding for intermittently connected wir...
 
11.online library management system
11.online library management system11.online library management system
11.online library management system
 
06.e voting system
06.e voting system06.e voting system
06.e voting system
 
New web based projects list
New web based projects listNew web based projects list
New web based projects list
 
Power controlled medium access control
Power controlled medium access controlPower controlled medium access control
Power controlled medium access control
 
IEEE PROJECTS LIST
IEEE PROJECTS LIST IEEE PROJECTS LIST
IEEE PROJECTS LIST
 
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
 
Control cloud data access privilege and anonymity with fully anonymous attrib...
Control cloud data access privilege and anonymity with fully anonymous attrib...Control cloud data access privilege and anonymity with fully anonymous attrib...
Control cloud data access privilege and anonymity with fully anonymous attrib...
 
Cloud keybank privacy and owner authorization
Cloud keybank  privacy and owner authorizationCloud keybank  privacy and owner authorization
Cloud keybank privacy and owner authorization
 
Circuit ciphertext policy attribute-based hybrid encryption with verifiable
Circuit ciphertext policy attribute-based hybrid encryption with verifiableCircuit ciphertext policy attribute-based hybrid encryption with verifiable
Circuit ciphertext policy attribute-based hybrid encryption with verifiable
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Reducing fragmentation for in line deduplication

  • 1. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History And Cache Knowledge ABSTRACT In backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection efficiency, while the out-of-order container decreases restore performance if the restore cache is small. In order to reduce the fragmentation, we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to accurately identify and reduce sparse containers, and CAF exploits restore cache knowledge to identify the out-of-order containers that hurt restore performance. CAF efficiently complements HAR in datasets where out-of-order containers are dominant. To reduce the metadata overhead of the garbage collection, we further propose a Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks. Our extensive experimental results from real-world datasets show HAR significantly improves the restore performance by 2.84-175.36at a cost of only rewriting 0.5-2.03% data Existing System:
  • 2. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 In backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection efficiency, while the out-of-order container decreases restore performance if the restore cache is small. In order to reduce the fragmentation Proposed System we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to accurately identify and reduce sparse containers, and CAF exploits restore cache knowledge to identify the out-of-order containers that hurt restore performance. CAF efficiently complements HAR in datasets where out-of-order containers are dominant. To reduce the metadata overhead of the garbage collection, we further propose a Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks. Our extensive experimental results from real-world datasets show HAR significantly improves the restore performance by 2.84-175.36at a cost of only rewriting 0.5-2.03% data. propose a hybrid rewriting algorithm as complements of HAR to reduce the negative impacts of out-of-order containers. HAR, as well as OPT, improves restore performance by 2.84- 175.36 at an acceptable cost in deduplication ratio. HAR outperforms the state-of-the-art work in terms of both deduplication ratio and restore performance. The hybrid scheme is helpful to further improve restore performance in datasets where out-of-order containers are dominant. To avoid a significant decrease of deduplication ratio in the hybrid scheme, we develop a Cache-Aware Filter (CAF) to exploit cache knowledge. With the help of CAF, the hybrid scheme significantly improves the deduplication ratio without decreasing
  • 3. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 the restore performance. Note that CAF can be used as an optimization of existing rewriting algorithms. MODULES Architecture: MODULES  Storage system
  • 4. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457  Data Deduplication  chunk fragmentation,  performance evaluation Storage system Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment protected and running. People and organizations buy or lease storage capacity from the providers to store end user, organization, or application data Data Deduplication multiple backups, the chunks of a backup unfortunately become physically scattered in different containers, which is known as fragmentation. The negative impacts of the fragmentation are two-fold. First, the fragmentation severely decreases restore performance. The infrequent restore is important and the main concern from users Moreover, data replication, which is important for disaster recovery, requires reconstructions of original backup streams from deduplication systems and thus suffers from a performance problem similar to
  • 5. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 the restore operation. Second, the fragmentation results in invalid chunks (not referenced by any backups) becoming physically scattered in different containers when users delete expired backups. Existing garbage collection solutions first identify valid chunks and the containers holding only a few valid chunks (i.e., reference management Then, a merging operation is required to copy the valid chunks in the identified containers to new containers Finally, the identified containers are reclaimed. Unfortunately, the metadata space overhead of reference management is proportional to the number of chunks, and the merging operation is the most timeconsuming phase in garbage collection We observe that the fragmentation comes in two categories of containers: sparse containers and out-of-order containers, which have different negative impacts and require dedicated solutions. Chunk fragmentation The fragmentation problem in deduplication systems has received many attentions. iDedup eliminates sequential and duplicate chunks in the context of primary storage systems. Nam et al. propose a quantitative metric to measure the fragmentation level of deduplication systems , and a selective deduplication scheme for backup workloads. SAR stores hot chunks in SSD to accelerate reads. RevDedup employs a hybrid inline and out-of-line deduplication scheme to improve restore performance of latest backups. The Context-Based Rewriting algorithm (CBR) [17] and the capping algorithm (Capping) are recently proposed rewriting algorithms to address the fragmentation problem. Both of them buffer a small part of the on-going backup stream during a backup, and identify fragmented chunks within the buffer (generally 10-20 MB). For example,
  • 6. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Capping divides the backup stream into fixed-sized segments (e.g., 20 MB), and conjectures the fragmentation within each segment. Capping limits the maximum number (say T) of containers a segment can refer to. Suppose a new segment refers to N containers and N > T, the chunks in the N �T containers that hold the least chunks in the segment are rewritten. Reference management for the garbage collection is complicated since each chunk can be referenced by multiple backups. The offline approaches traverse all fingerprints (including the fingerprint index and recipes) when the system is idle. For example, Botelho et al. [14] build a perfect hash vector as a compact representation of all chunks. Since recipes need to occupy significantly large storage space , the traversing operation is timeconsuming. The inline approaches maintain additional metadata during backup to facilitate the garbage collection. Maintaining a reference counter for each chunk is expensive and error-prone . Grouped Mark- and- Sweep (GMS) uses a bitmap to mark which chunks in a container are used by a backup. performance evaluation Four datasets, including Kernel, VMDK, RDB, and Synthetic, are used for evaluation. Their characteristics are listed in Table 1. Each backup stream is divided into variable-sized chunks via Content-Defined Chunking . Kernel, downloaded from the web is a commonly used public dataset [. It consists of 258 consecutive versions of unpacked Linux codes. Each version is 412:78 MB on average. Two consecutive versions are generally 99% identical except when there are major revision upgrades. There are only a few self-references and
  • 7. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 hence sparse containers are dominant. VMDK is from a virtual machine installed Ubuntu 12.04LTS, which is a common use-case in real-world . We compiled source code, patched the system, and ran an HTTP server on the virtual machine. VMDK consists of 126 full backups. Each full backup is 15:36 GB in size on average, and 90-98% identical to its adjacent backups. Each backup contains about 15% self-referred chunks. Out-of-order containers are dominant and sparse containers are less severe. RDB consists of snapshots of a Redis database The database has 5 million records, 5 GB in space. We ran YCSB to update the database in a Zipfian distribution. The update ratio is of 1% on average. After each run, we archived the uncompressed dump.rdb file that is the on-disk snapshot of the database. Finally, we got 212 versions of snapshots. There is no self-reference and hence sparse containers are dominant. Synthetic was generated according to existing approaches We simulated common operations of file systems, such as file creation/deletion/modification. We finally obtained a 4:5 TB dataset with 400 versions. There is no self-reference in Synthetic and sparse containers are dominant. History-Aware Rewriting Algorithm
  • 8. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 At the beginning of a backup, HAR loads IDs of all inherited sparse containers to construct the in-memory Sinherited structure During the backup, HAR rewrites all duplicate chunks whose container IDs exist in Sinherited. Additionally, HAR maintains an in-memory structure, Semerging (included in collected info in Figure 3), to monitor the utilizations of all the containers referenced by the backup. Semerging is a set of utilization records, and each
  • 9. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 record consists of a container ID and the current utilization of the container. After the backup concludes, HAR removes the records of higher utilizations than the utilization threshold from Semerging. Semerging then contains IDs of all emerging sparse containers. In most cases, Semerging can be flushed directly to disks as the Sinherited of the next backup, because the size of Semerging is generally small due to our second observation. However, there are two spikes in . A large number of emerging sparse containers indicates that we have many fragmented chunks to be rewritten in next backup. It would change the performance bottleneck to data writing and hurt the backup performance that is of top priority . To address this problem, HAR sets a rewrite limit, such as 5%, to avoid too much rewrites in next backup. HAR uses the rewrite limit to determine whether there are too many sparse containers in Semerging. (1) HAR calculates an estimated rewrite ratio (defined as the size of rewritten data divided by the backup size) for the next backup. Specifically, HAR first calculates the System Configuration: HARDWARE REQUIREMENTS: Hardware - Pentium Speed - 1.1 GHz RAM - 1GB
  • 10. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Hard Disk - 20 GB Floppy Drive - 1.44 MB Key Board - Standard Windows Keyboard Mouse - Two or Three Button Mouse Monitor - SVGA SOFTWARE REQUIREMENTS: Operating System : Windows Technology : Java and J2EE Web Technologies : Html, JavaScript, CSS IDE : My Eclipse Web Server : Tomcat Tool kit : Android Phone Database : My SQL Java Version : J2SDK1.5