Improving availability and reducing redundancy using deduplication of cloud storage system

1
Dissertation Phase-II Presentation On:
“Improving the availability and reducing
redundancy using deduplication of cloud storage system “
Presented by:
Mr. Dhanaraj S. Patil.
Under The Guidance Of:
Mrs. R.J. Deshmukh.

OUTLINE
• Cloud storage system
• Cloud of clouds
• Replication & Erasure code
• Problem Statement
• Achieved objects
• System Architecture
• Experimental Setup
• Implementation & Result
• Conclusion
2

CLOUD STORAGE SYSTEM
• The digital data is stored in logical pools
• Public, private and hybrid
• Advantages:-
Pay-per-Use
Availability
• Disadvantages:-
Data outage
Vendor lock in problem
• Example:-
Amazon s3, windows Azure
3

CLOUD-OF-CLOUD
• The digital data is stored in logical pools
• Multiple cloud venders at one point
• low cost
• no vender-lock
• example:- Depsky
4

REPLICATION
• Creating multiple copies of data
• Widely used in cloud storage systems
• 3- replica strategy
• Improves reliability, fault-tolerance, accessibility
5

ERASURE CODE
• Data is broken into fragments, expanded and encoded with
redundant data pieces
• consumes less storage
• Data can be rebuild from any fragment.
• drawback:- CPU-intensive
6

LITERATURE SURVEY
•Ensuring Cloud data reliability with minimum replication by
proactive replica checking
•Replication-based Load Balancing scheme
7

PROBLEM STATEMENT
To develop a system which implements efficient
cloud storage using data deduplication technique to avoid
data redundancy problem.
8

ACHIEVED OBJECTIVES
• To study the different data distribution technique in
cloud system.
• To analyze the hybrid redundant [HyRD] data
distribution scheme.
• To design system for data redundancy problem by
applying data deduplication with versioning.
• To measure the performance of Implemented system
with existing system.
9

SYSTEM ARCHITECTURE
Modules :
• Data owner
• File verification
• File versioning
• Hybrid redundancy
11

Message Digest 5 algorithm [MD5] :
Step 1: Appending padding bits
message is extended to length 448 modulo 512.
Step 2: Appending length
A 64- bit representation of message is added.
Step 3: Initialize MD buffer
It used to store the result.
word A: 01 23 45 67
word B: 89 ab cd ef
word C: fe dc ba 98
word D: 76 54 32 10
12

Message Digest 5 algorithm [MD5]
contd.
Step 4: process message in 16- word blocks
Define 4 Auxiliary functions. Which helps in
processing message in 512 –bit block
Step 5: Output
To produce digest just add a,b,c,d and convert it
into hexadecimal.
13

EXPERIMENTAL SETUP
1. Hardware Requirements
Processor: Pentium Dual-Core 2.50 GHz (Or Above)
Memory: 1GB (Or Above)
2. Software Requirements
Operating System: Windows 7/ 8 and above
Front End & Back end: HTML, PHP
Database: MySql
14

RESULTS19
Following Table Describes the storage consumption
in the cloud which is consider the size of file. In this
we compare the storage space used by existing
system and our system; for fixed size file.

RESULTS22
In File versioning we made versions of file which are
having same file name but different content or data in it. In
this we attach the version number to file name and made
new file. We compare existing system and implemented
system by uploading same file having same name with
different content in it. In following Fig we shown the
version count of file with respect to attempts of uploading
the file having different content or data in implemented
system and existing system.

RESULTS24
The Following figure shows that the graphical analysis of file
uploading in the cloud. Where x-axis describes File size in kb and
y-axis describes time in seconds

APPLICATIONS25
• Medical Business Like Hospitals, Clinics, Medical
stores
• E-learning For Educational Field
• Company Database

CONCLUSION
Availability is one of the main key constraint of the cloud
storage service that user must consider while uploading data to cloud.
With single cloud storage system problem may arise such as, vendor-
lock-in, service outage etc. In existing system, the inter cloud system was
based on hybrid redundancy distribution technique but still it shows data
redundancy issues. The implemented system tries to solve above problem
with the help of MD5 and versioning.
The system describes several techniques to reduce the data
redundancy problem. To implement this MD5 algorithm is used for
verification of the hash values of the file and file versions are maintain
for availability and durability of the data. An experimental study shows
that redundancy problem can be reduced and data availability maintains
with our approach. For the future work we are trying to add security to
our system while sharing our data and we can also tries to provide access
control policies.
26

REFERENCES
[1] Bo Mao, Suzhen Wu and Hong Jiang “Exploiting Workload Characteristics and Service Diversity to
Improve the Availability of Cloud Storage Systems”, IEEE Transactions on Parallel and Distributed
Systems, Pages: 2010 – 2021, Year: 2016.
[2] Wenhao Li, Yun Yang, Dong Yuan, “Ensuring Cloud data reliability with minimum replication by
proactive replica checking”, IEEE TRANSACTIONS ON COMPUTERS, Pages: 1494 - 1506, Year: 2016.
[3] Maomeng Su, Lei Zhang, Yongwei Wu, Kang Chen, and Keqin Li, “Systematic Data Placement
Optimization in Multi-Cloud Storage for Complex Requirements”, IEEE TRANSACTIONS ON
COMPUTERS, Pages: 1964 –1977, Year: 2016.
[4] Amir Nahir, Ariel Orda, and Danny Raz, “Replication-based Load Balancing”, IEEE TRANSACTIONS
ON PARALLEL AND DISTRIBUTED SYSTEMS, Pages: 494 – 507, Year: 2016.
[5] Shiuan-Tzuo Shen, Hsiao-Ying Lin, and Wen-Guey Tzeng, “An Effective Integrity Check Scheme for
Secure Erasure Code-Based Storage Systems”, IEEE TRANSACTIONS ON RELIABILITY, Pages: 840 –
851, Year: 2015.
[6] Ayad F. Barsoum and M. Anwar Hasan, “Provable Multicopy Dynamic Data Possession in Cloud
Computing Systems”, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND
SECURITY,Pages: 485 - 497, Year: 2015.
[7] Frederik Armknecht, Jens-Matthias Bohli, Ghassan O. Karame, Franck Youssef, “Transparent Data
Deduplication in the Cloud”, In Proceedings of the 22nd ACM SIGSAC Conference on Computer and
Communications Security, October 2015.
[8] N.Jayapandian, Dr.A.M.J.Md.Zubair Rahman, I.Nandhini, “A Novel Approach for Handling Sensitive
Data with Deduplication Method in Hybrid Cloud”, 2015 Online International Confernece on Green
Engineering and Technologies (IC-GET 2015), Pages: 1 – 6, Year: 2015.
[9] Ghazal Riahi “E-learning systems based on cloud computing: A Review”, Procedia Computer Science
62, 352 – 359, 2015.
[10] Hui Zhang, Guofei Jiang, Kenji Yoshihira, and Haifeng Chen, “Proactive Workload Management in
Hybrid Cloud Computing”, IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT,
Pages: 90 – 100, Year: 2014.
27

REFERENCES
[11] X. Zhang, M. Tsugawa, Y. Zhang, H. Song, C. Cao, G. Huang, and J. Fortes. Towards Model-Defined
Cloud of Clouds, In Proceedings of the 17th International Conference on Model Driven Engineering
Languages and Systems (MODELS‟14), pages 41–45, Sep. 2014.
[12] Osama Khan, Randal Burns, James Plank, William Pierce Cheng Huang, “Rethinking Erasure Codes for
Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads”, In Proceedings of the 10th
USENIX conference on File and Storage Technologies, Pages 20-20, February 2012.
[13] Jain, A. and S. chawla.”E-learning in the cloud”, International Journal of Latest Research in Science and
Technology 2(1): 478-481. 2013.
[14] Y. Ma, T. Nandagopal, K. Puttaswamy, and S. Banerjee, “An Ensemble of Replication and Erasure Codes
for Cloud File Systems”, In Proceedings of the 32nd IEEE International Conference on Computer
Communications (INFOCOM‟13), pages 1276–1284, Apr. 2013.
[15] Cloud computing:- https://en.wikipedia.org/wiki/Cloud_computing
[16] Y. Wang, L. Alvisi, and Mike Dahlin. Gnothi: Separating Data and Metadata for Efficient and Available
Storage Replication, In Proceedings of the 2012 USENIX Annual Technical Conference (ATC‟12), pages
413–424, Jun. 2012.
[17] Md. Alam Hossain, Md. Kamrul Islam, Subrata Kumar Das and Md. Asif Nashiry “CRYPTANALYZING
OF MESSAGE DIGEST ALGORITHMS MD4 AND MD5”, International Journal on Cryptography and
Information Security(IJCIS),Vol.2, No.1,March 2012.
[18] DepSky:- http://cloud-of-clouds.github.io/depsky/
[19] Hussam Abu-Libdeh, Lonnie Princehouse, Hakim Weatherspoon, “RACS: A Case for Cloud Storage
Diversity”, In Proceedings of the 1st ACM symposium on Cloud computing, Pages 229-240, June 2010.
[20] Alysson Bessani, Miguel Correia, Bruno Quaresma, Fernando Andr´e, Paulo Sousa, “DEPSKY:
Dependable and Secure Storage in a Cloud-of-Clouds”, In Proceedings of the sixth conference on
Computer systems, Pages 31-46, April 2011.
[21] Rivest R., 1992, “The MD5 Message-Digest Algorithm,”RFC 1321,MIT LCS and RSA Data Securit y,
Inc.
28

Improving availability and reducing redundancy using deduplication of cloud storage system

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Improving availability and reducing redundancy using deduplication of cloud storage system

Similar to Improving availability and reducing redundancy using deduplication of cloud storage system (20)

Recently uploaded

Recently uploaded (20)

Improving availability and reducing redundancy using deduplication of cloud storage system