SlideShare a Scribd company logo
1 of 3
Download to read offline
International Journal of Trend in Scientific Research and Development (IJTSRD)
Volume 6 Issue 2, January-February 2022 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1331
File Sharing and Data Duplication Removal in
Cloud Using File Checksum
Gopi B, Murugan R
School of Computer Science and Information Technology,
Jain (deemed to be University), Bangalore, Karnataka, India
ABSTRACT
Data duplication uses file checksum technique to identify the
duplicate or redundant data rapidly and accurately. There may be the
chance of inaccurate result which can be avoided by comparing the
checksum of already exiting file with newly uploaded file. The file
can be stored using multiple attributes such as file name, date and
time, checksum, user id, and so on. When the user uploads the new
files the system will generates the checksum of the file and compare
it with the check of file that has already been stored. If the match is
found then it will update the old entry otherwise new entry will be
created into the database.
KEYWORDS: Database, Duplication, Entity, Data, Checksum,
Redundant, User id
How to cite this paper: Gopi B |
Murugan R "File Sharing and Data
Duplication Removal in Cloud Using
File Checksum" Published in
International
Journal of Trend in
Scientific Research
and Development
(ijtsrd), ISSN: 2456-
6470, Volume-6 |
Issue-2, February
2022, pp.1331-
1333, URL:
www.ijtsrd.com/papers/ijtsrd49416.pdf
Copyright © 2022 by author (s) and
International Journal of Trend in
Scientific Research and Development
Journal. This is an
Open Access article
distributed under the
terms of the Creative Commons
Attribution License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)
1. INTRODUCTION
The collection of information is known as data. The
data is increasing constantly in the digital universe. A
study suggests that at end of 2020 each person will
create 1.7 megabyte of data. It is also clear that the
rate of data production per day is about 2.5 quintillion
bytes of data. The reasons behind the growth of
multiple data are:
Multiple backup of data or file by single person.
Misuses of social media.
The hacking of the organisation system in 9/11 and
loss of data caused by illegal activity proved that loss
of data is major problem for the organization. This
event forces the organization to implement data back
of system in order to preserve their important data.
The organizations started keeping regular backup of
their data such as email, video audio etc. which
increase their storage unit. While backing the data
regularly, they end up with storing the duplicate data
multiple times which is the misuse of storage.
As the data is increasing constantly storing them and
managing them becomes more difficult. More data
requires more storage and more storage require more
cost as we have to increase the hardware or storage
unit. Only increasing the storage unit is not the
solution because we are not sure that how much
storage unit we have to add. Adding more number of
storage units makes system bulk and more costly.
So, the solution to above problem is proper
implementation of data duplication removal system.
The data duplication removal method stores the data
or file to the system if they are not stored previously.
If the match is found then it will update the old entry.
So this system will remove the duplicate data quickly
and saves the precious storage units.
2. SURVEY MOTIVATION
"Di Pietro, Roberto, and Alessandro Sorniotti"
discussed the security concern raised by de-
duplication and to address this security concern the
author utilizes the idea of Proof of Ownership
(POW). POW are intended to permit server to verify
whether a client possesses a file or not.
IJTSRD49416
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1332
According To “Atishkathpal Matthew John Anf
Gauravmakkar”, data duplication removal is the
method of eliminating the duplicate data from the
storage devices in order to minimize the consumption
of memory in storage devices. Since, the concepts
were good but their system cannot work as they
intended due to poor management of hardware
devices and not easy to use which result in the under
performance of the system.
2.1. GOAL
Many work has been done in past in order to save the
storage problem that is caused by data duplication.
Data duplication has been the major problem and the
technology developed in past was not able to solve
the problem due to improper management of
technology.
2.2. LIMITATION
More processing time.
Chance of false result.
Not user friendly.
System maintenance is difficult.
2.3. KEYWORDS
Cloud computing, data storage, file checksum
algorithms, computational infrastructure, duplication.
3. SURVEY OUTCOMES
Data Deduplication increases the amount of unwanted
data in the storage unit by storing the multiple copy of
same file. Data duplication removal technique uses
file checksum technique to find duplicate or
redundant data quickly. The technique calculates the
checksum of the file when the file is uploaded and
checks the newly calculated checksum with the
checksum of file that are already store in database. If
the file is already present it will modify the file else it
will make new entry of file. In this system we are
going to use MD-5 hash algorithm, to detect the
duplicate file. MD-5 refers to Message Digest
algorithm which is 128 bit hash algorithm.
Advantages:
Faster file searching.
Reduce storage space by eliminating data
redundancy.
Ease to download and upload file.
4. CONCLUSION
This technique focus in developing web based
application that can find the redundant data quickly
and easily using file checksum technique. For
calculating the checksum of already existing files and
new file Message Digest (MD-5) algorithm is used.
MD-5 algorithm is used to calculate the checksum as
well as to provide the better security and encryption
to the valuable files of users. Hence, this system
removes duplicate file easily and quickly by
providing better security.
5. REFERENCES
[1] Di Pietro, Roberto and Alessandro Sorniotti,
"Proof of ownership for de-duplication
systems: A secure, scalable, and efficient
solution", Computer Communications, 15 May
2016.
[2] M. Bellare,S. Keelveedhi, and T. Ristenpart,
"Dupless: Server aided encryption for
deduplicated storage", USENIX Security
Symposium, 2013.
[3] Harnik, Danny, Alexandra Shulman-Peleg and
Benny Pinkas, "Side channels in cloud services,
the case of deduplication in cloud storage ",
IEEE Security & Privacy 8, 2014.
[4] Atishkathpal, Matthew John and
Gauravmakkar, "Distributed Duplicate
Detection in Post-Process Data De-
duplication", Conference: HiPC , 2011
[5] X. Zhao, Y. Zhang, Y. Wu, K. Chen, J.
Jiang, K. Li, "Liquid: A Scalable Deduplication
File System for Virtual Machine Images", IEEE
Transactions on Parallel and Distributed
Systems, January 2013.
[6] Stephen J. Bigelow, "Data Deduplication
Explained: http://searchgate.org", February,
2018
[7] http://www.computerweekly.com/report/Data-
duplication-technology-review
[8] https://nevonprojects.com
[9] Morris Dworkin, 2015; NIST Policy on Hash
Functions; Cryptographic Technology
group,https://csrc.nist.gov/projects/hash-
functions/nist-policy-on-hash-functions August
5, 2015; “National Institute of Standard and
Technology NIST Special Publication 800-145
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1333
[10] NimalaBhadrappa, Mamatha G. S. 2017,
Implementation of De-Duplication Algorithm,
International Research Journal of Engineering
and Technology (IRJET), Volume 04, Issue
09.
[11] O’Brien, J. A. &Marakas, G. M. (2011).
Computer Software. Management Information
Systems 10th ed. 145. McGraw-Hill/Irwin
[12] Peter Mel; The NIST definition of Cloud
Computing, “National Institute of Standard and
Technology NIST Special Publication 800-145
[13] PHP 5 tutorials; W3Schools,
https://www.w3schools.com/pHP/default.asp
[14] Accessed June, 2018.
[15] Rivest R., 1992 The MD5 Message Digest
Algorithm. RFC 1321
http://www.ietf.org/rfc/rfc321.txt
[16] Sandeep Sharma, 2015; 15 Best PHP Libraries
Every Developer Should Know; published on;
https://www.programmableweb.com/news/15-
best-php-libraries-every-developer-should-
know/analysis/2015/11/18 ; accessed June 12,
2018.
[17] Single Instance Storage in Microsoft Windows
Storage Server 2003 R2Archived 2007-01-04
at the Way back Machine:
https://archive.org/webTechnical White Paper:
Published May 2006 access September, 2018.
[18] Stephen J. Bigelow, 2007 Data Deduplication
Explained: http://searchgate.org; Accessed
February, 2018
[19] Wenying Zeng, Yuelong K. O, Wei S., (2009)
Research on Cloud Storage Architecture and
Key Technologies, ICIS 2009 Proceedings of
the 2nd International Conference on Interaction
Sciences: Information Technology, Culture and
Human Pages 1044-1048.
[20] What is PHP? PHP User contributory notes;
http://php.net/manual/en/intro-whatis.php.
Accessed June 6, 2018
[21] X. Zhao, Y. Zhang, Y. Wu, K. Chen, J. Jiang,
K. Li, "Liquid: A scalable deduplication file
system for virtual machine
[22] Images", Parallel and Distributed Systems
IEEE Transactions on, vol. 25, no. 5, pp. 1257-
1266, May 2014.

More Related Content

Similar to File Sharing and Data Duplication Removal in Cloud Using File Checksum

Rijndael Algorithm for Multiple File Encryption Development
Rijndael Algorithm for Multiple File Encryption DevelopmentRijndael Algorithm for Multiple File Encryption Development
Rijndael Algorithm for Multiple File Encryption Developmentijtsrd
 
Hybrid Cloud Approach with Security and Data Deduplication
Hybrid Cloud Approach with Security and Data DeduplicationHybrid Cloud Approach with Security and Data Deduplication
Hybrid Cloud Approach with Security and Data Deduplicationijtsrd
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataIRJET Journal
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
Big Data A Review
Big Data A ReviewBig Data A Review
Big Data A Reviewijtsrd
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemijitjournal
 
IRJET - Privacy Preserving Keyword Search over Encrypted Data in the Cloud
IRJET -  	  Privacy Preserving Keyword Search over Encrypted Data in the CloudIRJET -  	  Privacy Preserving Keyword Search over Encrypted Data in the Cloud
IRJET - Privacy Preserving Keyword Search over Encrypted Data in the CloudIRJET Journal
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
Data Search in Cloud using the Encrypted Keywords
Data Search in Cloud using the Encrypted KeywordsData Search in Cloud using the Encrypted Keywords
Data Search in Cloud using the Encrypted KeywordsIRJET Journal
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersIRJET Journal
 
IRJET- Cross User Bigdata Deduplication
IRJET-  	  Cross User Bigdata DeduplicationIRJET-  	  Cross User Bigdata Deduplication
IRJET- Cross User Bigdata DeduplicationIRJET Journal
 
V1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.docV1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.docpraveena06
 
Privacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential DatabasePrivacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential Databaseijdmtaiir
 
Efficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted DataEfficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted DataIRJET Journal
 
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud ComputingAn Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computingijtsrd
 
Big Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingBig Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingIOSR Journals
 
IRJET- Adaptable Wildcard Searchable Encryption System
IRJET- Adaptable Wildcard Searchable Encryption SystemIRJET- Adaptable Wildcard Searchable Encryption System
IRJET- Adaptable Wildcard Searchable Encryption SystemIRJET Journal
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moVinaOconner450
 
Comparative Analysis of Digital Forensic Extraction Tools
Comparative Analysis of Digital Forensic Extraction ToolsComparative Analysis of Digital Forensic Extraction Tools
Comparative Analysis of Digital Forensic Extraction Toolsijtsrd
 
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
 

Similar to File Sharing and Data Duplication Removal in Cloud Using File Checksum (20)

Rijndael Algorithm for Multiple File Encryption Development
Rijndael Algorithm for Multiple File Encryption DevelopmentRijndael Algorithm for Multiple File Encryption Development
Rijndael Algorithm for Multiple File Encryption Development
 
Hybrid Cloud Approach with Security and Data Deduplication
Hybrid Cloud Approach with Security and Data DeduplicationHybrid Cloud Approach with Security and Data Deduplication
Hybrid Cloud Approach with Security and Data Deduplication
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted Data
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Big Data A Review
Big Data A ReviewBig Data A Review
Big Data A Review
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 
IRJET - Privacy Preserving Keyword Search over Encrypted Data in the Cloud
IRJET -  	  Privacy Preserving Keyword Search over Encrypted Data in the CloudIRJET -  	  Privacy Preserving Keyword Search over Encrypted Data in the Cloud
IRJET - Privacy Preserving Keyword Search over Encrypted Data in the Cloud
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Data Search in Cloud using the Encrypted Keywords
Data Search in Cloud using the Encrypted KeywordsData Search in Cloud using the Encrypted Keywords
Data Search in Cloud using the Encrypted Keywords
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providers
 
IRJET- Cross User Bigdata Deduplication
IRJET-  	  Cross User Bigdata DeduplicationIRJET-  	  Cross User Bigdata Deduplication
IRJET- Cross User Bigdata Deduplication
 
V1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.docV1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.doc
 
Privacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential DatabasePrivacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential Database
 
Efficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted DataEfficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted Data
 
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud ComputingAn Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
 
Big Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingBig Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud Computing
 
IRJET- Adaptable Wildcard Searchable Encryption System
IRJET- Adaptable Wildcard Searchable Encryption SystemIRJET- Adaptable Wildcard Searchable Encryption System
IRJET- Adaptable Wildcard Searchable Encryption System
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated mo
 
Comparative Analysis of Digital Forensic Extraction Tools
Comparative Analysis of Digital Forensic Extraction ToolsComparative Analysis of Digital Forensic Extraction Tools
Comparative Analysis of Digital Forensic Extraction Tools
 
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
 

More from ijtsrd

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementationijtsrd
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...ijtsrd
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospectsijtsrd
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...ijtsrd
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...ijtsrd
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...ijtsrd
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Studyijtsrd
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...ijtsrd
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...ijtsrd
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...ijtsrd
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...ijtsrd
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...ijtsrd
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadikuijtsrd
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...ijtsrd
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...ijtsrd
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Mapijtsrd
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Societyijtsrd
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...ijtsrd
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...ijtsrd
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learningijtsrd
 

More from ijtsrd (20)

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Study
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Map
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Society
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learning
 

Recently uploaded

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 

Recently uploaded (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 

File Sharing and Data Duplication Removal in Cloud Using File Checksum

  • 1. International Journal of Trend in Scientific Research and Development (IJTSRD) Volume 6 Issue 2, January-February 2022 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470 @ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1331 File Sharing and Data Duplication Removal in Cloud Using File Checksum Gopi B, Murugan R School of Computer Science and Information Technology, Jain (deemed to be University), Bangalore, Karnataka, India ABSTRACT Data duplication uses file checksum technique to identify the duplicate or redundant data rapidly and accurately. There may be the chance of inaccurate result which can be avoided by comparing the checksum of already exiting file with newly uploaded file. The file can be stored using multiple attributes such as file name, date and time, checksum, user id, and so on. When the user uploads the new files the system will generates the checksum of the file and compare it with the check of file that has already been stored. If the match is found then it will update the old entry otherwise new entry will be created into the database. KEYWORDS: Database, Duplication, Entity, Data, Checksum, Redundant, User id How to cite this paper: Gopi B | Murugan R "File Sharing and Data Duplication Removal in Cloud Using File Checksum" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456- 6470, Volume-6 | Issue-2, February 2022, pp.1331- 1333, URL: www.ijtsrd.com/papers/ijtsrd49416.pdf Copyright © 2022 by author (s) and International Journal of Trend in Scientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0) 1. INTRODUCTION The collection of information is known as data. The data is increasing constantly in the digital universe. A study suggests that at end of 2020 each person will create 1.7 megabyte of data. It is also clear that the rate of data production per day is about 2.5 quintillion bytes of data. The reasons behind the growth of multiple data are: Multiple backup of data or file by single person. Misuses of social media. The hacking of the organisation system in 9/11 and loss of data caused by illegal activity proved that loss of data is major problem for the organization. This event forces the organization to implement data back of system in order to preserve their important data. The organizations started keeping regular backup of their data such as email, video audio etc. which increase their storage unit. While backing the data regularly, they end up with storing the duplicate data multiple times which is the misuse of storage. As the data is increasing constantly storing them and managing them becomes more difficult. More data requires more storage and more storage require more cost as we have to increase the hardware or storage unit. Only increasing the storage unit is not the solution because we are not sure that how much storage unit we have to add. Adding more number of storage units makes system bulk and more costly. So, the solution to above problem is proper implementation of data duplication removal system. The data duplication removal method stores the data or file to the system if they are not stored previously. If the match is found then it will update the old entry. So this system will remove the duplicate data quickly and saves the precious storage units. 2. SURVEY MOTIVATION "Di Pietro, Roberto, and Alessandro Sorniotti" discussed the security concern raised by de- duplication and to address this security concern the author utilizes the idea of Proof of Ownership (POW). POW are intended to permit server to verify whether a client possesses a file or not. IJTSRD49416
  • 2. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1332 According To “Atishkathpal Matthew John Anf Gauravmakkar”, data duplication removal is the method of eliminating the duplicate data from the storage devices in order to minimize the consumption of memory in storage devices. Since, the concepts were good but their system cannot work as they intended due to poor management of hardware devices and not easy to use which result in the under performance of the system. 2.1. GOAL Many work has been done in past in order to save the storage problem that is caused by data duplication. Data duplication has been the major problem and the technology developed in past was not able to solve the problem due to improper management of technology. 2.2. LIMITATION More processing time. Chance of false result. Not user friendly. System maintenance is difficult. 2.3. KEYWORDS Cloud computing, data storage, file checksum algorithms, computational infrastructure, duplication. 3. SURVEY OUTCOMES Data Deduplication increases the amount of unwanted data in the storage unit by storing the multiple copy of same file. Data duplication removal technique uses file checksum technique to find duplicate or redundant data quickly. The technique calculates the checksum of the file when the file is uploaded and checks the newly calculated checksum with the checksum of file that are already store in database. If the file is already present it will modify the file else it will make new entry of file. In this system we are going to use MD-5 hash algorithm, to detect the duplicate file. MD-5 refers to Message Digest algorithm which is 128 bit hash algorithm. Advantages: Faster file searching. Reduce storage space by eliminating data redundancy. Ease to download and upload file. 4. CONCLUSION This technique focus in developing web based application that can find the redundant data quickly and easily using file checksum technique. For calculating the checksum of already existing files and new file Message Digest (MD-5) algorithm is used. MD-5 algorithm is used to calculate the checksum as well as to provide the better security and encryption to the valuable files of users. Hence, this system removes duplicate file easily and quickly by providing better security. 5. REFERENCES [1] Di Pietro, Roberto and Alessandro Sorniotti, "Proof of ownership for de-duplication systems: A secure, scalable, and efficient solution", Computer Communications, 15 May 2016. [2] M. Bellare,S. Keelveedhi, and T. Ristenpart, "Dupless: Server aided encryption for deduplicated storage", USENIX Security Symposium, 2013. [3] Harnik, Danny, Alexandra Shulman-Peleg and Benny Pinkas, "Side channels in cloud services, the case of deduplication in cloud storage ", IEEE Security & Privacy 8, 2014. [4] Atishkathpal, Matthew John and Gauravmakkar, "Distributed Duplicate Detection in Post-Process Data De- duplication", Conference: HiPC , 2011 [5] X. Zhao, Y. Zhang, Y. Wu, K. Chen, J. Jiang, K. Li, "Liquid: A Scalable Deduplication File System for Virtual Machine Images", IEEE Transactions on Parallel and Distributed Systems, January 2013. [6] Stephen J. Bigelow, "Data Deduplication Explained: http://searchgate.org", February, 2018 [7] http://www.computerweekly.com/report/Data- duplication-technology-review [8] https://nevonprojects.com [9] Morris Dworkin, 2015; NIST Policy on Hash Functions; Cryptographic Technology group,https://csrc.nist.gov/projects/hash- functions/nist-policy-on-hash-functions August 5, 2015; “National Institute of Standard and Technology NIST Special Publication 800-145
  • 3. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1333 [10] NimalaBhadrappa, Mamatha G. S. 2017, Implementation of De-Duplication Algorithm, International Research Journal of Engineering and Technology (IRJET), Volume 04, Issue 09. [11] O’Brien, J. A. &Marakas, G. M. (2011). Computer Software. Management Information Systems 10th ed. 145. McGraw-Hill/Irwin [12] Peter Mel; The NIST definition of Cloud Computing, “National Institute of Standard and Technology NIST Special Publication 800-145 [13] PHP 5 tutorials; W3Schools, https://www.w3schools.com/pHP/default.asp [14] Accessed June, 2018. [15] Rivest R., 1992 The MD5 Message Digest Algorithm. RFC 1321 http://www.ietf.org/rfc/rfc321.txt [16] Sandeep Sharma, 2015; 15 Best PHP Libraries Every Developer Should Know; published on; https://www.programmableweb.com/news/15- best-php-libraries-every-developer-should- know/analysis/2015/11/18 ; accessed June 12, 2018. [17] Single Instance Storage in Microsoft Windows Storage Server 2003 R2Archived 2007-01-04 at the Way back Machine: https://archive.org/webTechnical White Paper: Published May 2006 access September, 2018. [18] Stephen J. Bigelow, 2007 Data Deduplication Explained: http://searchgate.org; Accessed February, 2018 [19] Wenying Zeng, Yuelong K. O, Wei S., (2009) Research on Cloud Storage Architecture and Key Technologies, ICIS 2009 Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human Pages 1044-1048. [20] What is PHP? PHP User contributory notes; http://php.net/manual/en/intro-whatis.php. Accessed June 6, 2018 [21] X. Zhao, Y. Zhang, Y. Wu, K. Chen, J. Jiang, K. Li, "Liquid: A scalable deduplication file system for virtual machine [22] Images", Parallel and Distributed Systems IEEE Transactions on, vol. 25, no. 5, pp. 1257- 1266, May 2014.